From 0ec03cfa320f577dc0f95bd20005601ae3ecc4cb Mon Sep 17 00:00:00 2001 From: Jasper Ginn Date: Sun, 20 Oct 2024 21:17:02 +0200 Subject: [PATCH] docs: readme --- README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 1860dfb..2455c88 100644 --- a/README.md +++ b/README.md @@ -10,11 +10,16 @@ See 'examples' directory. ## Limitations -### When using partitioned assets +The table below shows which PyIceberg features are currently available. -There currently is no internal retry mechanism when concurrent operations attempt to write to the commit log at the same time (this is not supported by PyIceberg). See e.g. [this issue](https://github.com/apache/iceberg-python/issues/269) and [this issue](https://github.com/apache/iceberg-python/issues/1084). - -This means that, when using partitioned assets, you can see some partitions failing because another process was not finished writing to the commit log. A workaround has not yet been implemented for this. +| Feature | Supported | Link | Comment | +|--------------------------|-----------|-------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Adding files | ❌ | https://py.iceberg.apache.org/api/#add-files | Useful for existing partitions that users don't want to re-materialize/re-compute. | +| Schema evolution | ❌ | https://py.iceberg.apache.org/api/#schema-evolution | More complicated than e.g. delta lake since updates require diffing input table with existing Iceberg table. Approach should be similar to partition evolution. | +| PyIceberg commit retries | ✅ | https://github.com/apache/iceberg-python/pull/330 https://github.com/apache/iceberg-python/issues/269 | PR to add this to PyIceberg is open. Will probably be merged for an upcoming release. Added a custom retry function using Tenacity for the time being. | +| Partition evolution | ✅ | https://py.iceberg.apache.org/api/#partition-evolution | Create, Update, Delete | +| Table properties | ❌ | https://py.iceberg.apache.org/api/#table-properties | Can add this through metadata on the asset. | +| Snapshot properties | ❌ | https://py.iceberg.apache.org/api/#snapshot-properties | Useful for correlating Dagster runs to snapshots by adding tags to snapshot. | ### Implemented catalog backends @@ -28,6 +33,9 @@ The following catalog backends are currently implemented. The following engines are currently implemented. - arrow +- pandas + + ## To do