Skip to content

Commit

Permalink
docs: readme
Browse files Browse the repository at this point in the history
  • Loading branch information
JasperHG90 committed Oct 20, 2024
1 parent ad38a5d commit 0ec03cf
Showing 1 changed file with 12 additions and 4 deletions.
16 changes: 12 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,16 @@ See 'examples' directory.

## Limitations

### When using partitioned assets
The table below shows which PyIceberg features are currently available.

There currently is no internal retry mechanism when concurrent operations attempt to write to the commit log at the same time (this is not supported by PyIceberg). See e.g. [this issue](https://github.com/apache/iceberg-python/issues/269) and [this issue](https://github.com/apache/iceberg-python/issues/1084).

This means that, when using partitioned assets, you can see some partitions failing because another process was not finished writing to the commit log. A workaround has not yet been implemented for this.
| Feature | Supported | Link | Comment |
|--------------------------|-----------|-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Adding files || https://py.iceberg.apache.org/api/#add-files | Useful for existing partitions that users don't want to re-materialize/re-compute. |
| Schema evolution || https://py.iceberg.apache.org/api/#schema-evolution | More complicated than e.g. delta lake since updates require diffing input table with existing Iceberg table. Approach should be similar to partition evolution. |
| PyIceberg commit retries || https://github.com/apache/iceberg-python/pull/330 https://github.com/apache/iceberg-python/issues/269 | PR to add this to PyIceberg is open. Will probably be merged for an upcoming release. Added a custom retry function using Tenacity for the time being. |
| Partition evolution || https://py.iceberg.apache.org/api/#partition-evolution | Create, Update, Delete |
| Table properties || https://py.iceberg.apache.org/api/#table-properties | Can add this through metadata on the asset. |
| Snapshot properties || https://py.iceberg.apache.org/api/#snapshot-properties | Useful for correlating Dagster runs to snapshots by adding tags to snapshot. |

### Implemented catalog backends

Expand All @@ -28,6 +33,9 @@ The following catalog backends are currently implemented.
The following engines are currently implemented.

- arrow
- pandas



## To do

Expand Down

0 comments on commit 0ec03cf

Please sign in to comment.