Releases: apache/iceberg-python
PyIceberg 0.9.0
Full Changelog: pyiceberg-0.8.0...pyiceberg-0.9.0
There have been 243 new commits since the last minor release, 0.8.0, including 148 commits from various contributors and 95 from Dependabot. This release features contributions from 63 unique contributors, including 33 first-time contributors.
What's Changed
New Features
- Introduced the capability to perform
UPSERT
operations on their table directly within PyIceberg. - Added support for dynamic overwrites as an optimization when an entire partition is replaced.
- Implemented
namespace_exists
functionality for the REST catalog. - Extended the table updates to include new
remove-snapshot-ref
andremove-snapshot
action - Added
view_exists
method to the REST catalog as a part of the effort to add view support to the REST Catalog. - Implemented support for Alibaba OSS protocol in PyArrowFileIO
- Introduced read support for the Iceberg V3 spec.
- Added support for Location Providers for tables which includes the
ObjectStoreLocationProvider
and also enables for custom write paths for both data and metadata. - Extended
S3FileIO
operations to allow for cross region read support. - Introduced support to convert Iceberg table scan to
polars
DataFrame and LazyFrame. - Added support for the
all_manifests
metadata table. - Implemented support for writes to bucket partitioned tables.
- Added automatic metadata cleanup to iceberg tables via
write.metadata.delete-after-commit.enabled
. - Introduced syntactic sugar for
and
andor
operations in filters. - Implemented configurable S3 request timeout settings for better performance tuning.
- Add support to use
apache/iceberg-rest-fixture
image for integration tests - Introduced support to update table statistics
- Add support for Bucket and Truncate transforms utilizing
pyiceberg_core
(iceberg-rust
) - Add support for column projections from partition metadata
- Add support for
ResidualEvaluator
Deprecations
Catalog & Table Identifiers
- Parsing catalog-level identifiers in Catalog references is deprecated
- Please refer to tables using only their namespace and table name
Table.identifier
property is deprecated- Use
Table.name()
instead
- Use
Expression Parsing
- Parsing expressions with table names is deprecated
- Only provide field names in
row_filter
- Only provide field names in
Configuration Properties
rest.authorization-url
property is deprecated- Use
oauth2-server-uri
instead
- Use
gcs.endpoint
property is deprecated- Use
gcs.service.host
instead
- Use
- Properties starting with
adlfs.
are deprecated- Use properties that start with
adls.
- Use properties that start with
Table API Changes
project_table
is deprecated- Use
ArrowScan.to_table()
instead - Use
ArrowScan.to_record_batches()
instead
- Use
Name Mapping
NameMapping.find
is deprecated- Use
apply_name_mapping
instead
- Use
Table Update Field Removal
- The
initial_change
field has been removed from table updates, affecting:AddSchemaUpdate
AddPartitionSpecUpdate
AddSortOrderUpdate
Table Class Refactoring
Several table classes have been moved to private classes:
pyiceberg.table.Move
βpyiceberg.table.update.schema._Move
pyiceberg.table.MoveOperation
βpyiceberg.table.update.schema._MoveOperation
pyiceberg.table.DeleteFiles
βpyiceberg.table.update.snapshot._DeleteFiles
pyiceberg.table.FastAppendFiles
βpyiceberg.table.update.snapshot._FastAppendFiles
pyiceberg.table.MergeAppendFiles
βpyiceberg.table.update.snapshot._MergeAppendFiles
pyiceberg.table.OverwriteFiles
βpyiceberg.table.update.snapshot._OverwriteFiles
Table Properties Refactoring
Several constants have been moved to TableProperties
:
DEFAULT_MAX_SNAPSHOT_AGE_MS
βTableProperties.MAX_SNAPSHOT_AGE_MS_DEFAULT
DEFAULT_MIN_SNAPSHOTS_TO_KEEP
βTableProperties.MIN_SNAPSHOTS_TO_KEEP_DEFAULT
Documentation Updates
- Added documentation for the new
UPSERT
operation support. - Added documentation of the new
LocationProvider
feature. - Improve the
"How to Release"
documentation. - Add documentation linking to community contributing guidelines
- Add documentation on nightly build
Bug Fixes
- Fixed
KeyError
inadd_files
for Parquet files missing column stats. - Fixed
Table.scan
case sensitivity handling. - Resolved
TypeError
increate_match_filter
for composite keys. - Allowed leading underscore in column name used in row filter.
- Ensured correct statistics updates by removing redundant
snapshot_id
inSetStatisticsUpdate
. - Fixed namespace existence check for multi-level namespaces in
SqlCatalog
. - Improved handling of S3 request timeouts.
- Fixed
TypeError
in composite key joins.
Dependencies
- Remove
python
3.13 upper bound restriction - Remove
fsspec
upper bound restriction - Bump
PyArrow
to 19.0.0
Infra
- Improve and automate release process using github workflow
- Add support for testpypi nightly build
- Add
codespell
to pre-commit - Replace
pycln
withruff
Commits
Features
- Set default for
SortField
'stransform
by @kevinjqliu in #1347 - Boto Glue standard retry policy with configuration by @mark-major in #1307
- TEST: adopt new rest catalog image and enable tableExists tests by @sungwy in #1389
- Added force virtual addressing configuration for S3, Alibaba OSS protocol to use PyArrowFileIO by @helmiazizm in #1392
- Deserialize NestedField initial-default and write-default Attributes by @paulcichonski in #1432
- Snapshot: Make manifest-list required by @Fokko in #1385
- Implementing namespace_exists function on the REST Catalog by @AhmedNader42 in #1434
- Add Support for Dynamic Overwrite by @jqin61 in #931
- URL-encode partition field names in file locations by @smaheshwar-pltr in #1457
- Fix read from multiple s3 regions by @jiakai-li in #1453
- Nit fixes to URL-encoding of partition field names by @smaheshwar-pltr in #1499
- Support Location Providers by @smaheshwar-pltr in #1452
- Add
all_manifests
metadata table with tests by @soumya-ghosh in #1241 - Use
ObjectStoreLocationProvider
by default by @smaheshwar-pltr in #1509 - Improve
LocationProvider
unit tests by @smaheshwar-pltr in #1511 - Add table statistics by @ndrluis in #1285
- feat: Support Bucket and Truncate transforms on write by @sungwy in #1345
- ADLS: Support Vended Credentials by @Fokko in #1520
- Add V3 read support by @Fokko in #1554
- feat: support datetime objects in literal instantiation by @jayceslesar in #1542
- Make s3.request_timeout configurable by @metadaddy in #1568
- Update annotation with V3 by @Fokko in #1587
- Add
view_exists
method to REST Catalog by @shiv-io in #1242 - Hive metastore register table by @JoniKet in #1580
- Add support for
write.data.path
by @Fokko in #1611 - Accept date in literal by @TennyZhuang in #1618
- Implement column projection by @gabeiglio in #1443
- feat: implement InMemoryCatalog as a subclass of SqlCatalog by @hussein-awala in #1140
- Add base headers in properties to signer_headers by @tom-s-powell in #1610
- Add ResidualVisitor to compute residuals by @tusharchou in #1388
- Implement
write.metadata.delete-after-commit.enabled
to clean up old metadata files by @kaushiksrini in #1607 - feat: search current working directory for config file by @IndexSeek in #1464
- Feat/add support kerberize hivemetastore by @Fokko in #1634
- Added support for Polars DataFrame and LazyFrame by @yigal-rozenberg in #1614
- Filter rows directly from pa.RecordBatch by @gabeiglio in #1621
- Add table upsert support by @kevinjqliu in #1660
- Add support for
write.metadata.path
by @geruh in #1642 - Implement update for
remove-snapshot-ref
action by @grihabor in #1598 - Implement update for
remove-snapshots
action by @grihabor in #1561 - Add syntactic sugar for
and
andor
operation by @Fokko in #1697
Documentations
- Improve documentation for "how to release" by @kevinjqliu in https://github.com/apache/...
pyiceberg-0.8.1
Full Changelog: pyiceberg-0.8.0...pyiceberg-0.8.1
Patch Release PR: #1384
What's Changed
The behavior of Table.name
is changed to return the table name without the catalog name. This is a broader effort to remove references to the catalog name in pyiceberg.
- Replace usage of
Table.identifier
withTable.name
which returns the table name without the catalog name - Replace the use of a deprecated function (
identifier_to_tuple_without_catalog
) in pyiceberg; remove unnecessary warnings
Documentation updates are included to reflect the updated process in https://py.iceberg.apache.org/
- Update βhow to releaseβ documentation
- 0.8.0 post-release steps
Bug fixes
- Fix
add_files
for parquet files without column stats - Allow leading underscore in column name used in row filter
- Ignore tables without table_type property from Glue and Hive
- Write
null
in manifest list metadata when there is no parent-snapshot-id
Remove upper bound restrictions for dependency libraries; allow early testing of new versions
- Remove Python library version upper bound restriction; allow Python 3.13
- Remove fsspec library version upper bound restriction
Commits
36 new commits since the 0.8.0
release.
12 new commits will be included in 0.8.1
- 11 commits cherry-picked as bug fixes (listed below)
- 1 commit to bump version to
0.8.1
11 bug fixes (cherry-picked)
acbd071 Write null
when there is no parent-snapshot-id (#1383)
bb078cf Add instruction for patch release (#1373)
ab43c6c fix KeyError
raised by add_files
when parquet file doe not have column stats (#1354)
cc1ab2c Improve documentation for "how to release" (#1359)
64dc6fe Remove Python 3.13 upper bound restriction (#1355)
d86ab6e Allow leading underscore in column name used in row filter (#1358)
7a4734e Replace reference of Table.identifier
with Table.name
(#1346)
a66ddc0 Ignore tables without table_type
from Glue and Hive (#1332)
2cbc77d Drop upper bounds for fsspec and it's implementations (#1341)
7660a5b 0.8.0 post release steps (#1334)
b2f0a9e use the non-deprecated func (#1326)
New Contributors
- @sumanth-manchala made their first contribution in #1341
- @gitzwz made their first contribution in #1332
- @vincenzon made their first contribution in #1358
- @bigluck made their first contribution in #1355
- @binayakd made their first contribution in #1354
pyiceberg-0.8.0
What's Changed
PR
- Update PyIceberg Verify Release doc by @chinmay-bhat in #976
- DOCS: Add Github Actions Screenshots to Release Notes by @sungwy in #975
- Bump up version in dev Dockerfile and Issue Template by @ndrluis in #981
- Fix pydantic warning in the commit process by @ndrluis in #972
- Bump up Iceberg version to 1.6.0 by @ndrluis in #982
- Bug Fix: use appropriate partition spec for delete by @sungwy in #984
- [Bug Fix]Use
self.table_metadata
when in transaction by @HonahX in #985 - DOCS: Add more post release notes by @sungwy in #983
- Treat warning as error in CI/Dev by @ndrluis in #973
- Use 'strtobool' instead of comparing with a string. by @ndrluis in #988
- Fix: accept empty arrays in struct field lookup by @grobgl in #997
- Add ndrluis as collaborator by @sungwy in #1009
- Fix list namespace response in rest catalog by @ndrluis in #995
- Pyarrow IO property for configuring large v small types on read by @sungwy in #986
- Update metadata-log for non-rest catalogs by @soumya-ghosh in #977
- Exclude Python 3.9.7 due to import error in catalog module by @ndrluis in #526
- Deprecate rest.authorization-url in favor of oauth2-server-uri by @ndrluis in #962
- Allow setting
write.parquet.row-group-limit
by @Fokko in #1016 - Deprecate Redundant Identifier Support in TableIdentifier, and row_filter by @sungwy in #994
- Fix: Handle Empty RecordBatch within
_task_to_record_batches
, fix correctness issue with positional deletes by @sungwy in #1026 - Fix overwrite when filtering all the data by @ndrluis in #1023
- Allow setting
write.parquet.page-row-limit
by @Fokko in #1017 - DOCS: Remove older row for
write.parquet.row-group-limit
by @sungwy in #1030 - Improve test_version_format() error message for version mismatches by @laksh-krishna-sharma in #1015
- Bump version to 0.7.1 by @sungwy in #1034
- Support s3.signer.endpoint for nessie by @guitcastro in #1029
- [bug] fix reading with
to_arrow_batch_reader
andlimit
by @kevinjqliu in #1042 - Use
VisitorWithPartner
for name-mapping by @Fokko in #1014 - Fix tracing existing entries when there are deletes by @Fokko in #1046
- Coverage Run unit tests first before docker containers are set up by @Minfante377 in #1055
- Update "verify release" instruction by @kevinjqliu in #1064
- Fix Install Issues with
docutils = 0.21.post1
and exclude 3.12 from supported python dependencies by @sungwy in #1067 - Post Release 0.7.1 version updates by @sungwy in #1073
- Update create table doc to clarify ID re-assignment by @paulcichonski in #1072
- Refactor PyArrow DataFiles Projection functions by @sungwy in #1043
- DOCS: Exclude signature files from twine upload by @sungwy in #1071
- Increase the minimal required pyarrow version to 14.0.0 by @ndrluis in #1090
- Fix
table_exists
behavior in REST catalog by @ndrluis in #1096 - fix: improve makefile by @TiansuYu in #1091
- fix (issue-1079): allow update_column to set doc as '' by @TiansuYu in #1083
- prevent adding duplicate files by @amitgilad3 in #1036
- Add list_views to rest catalog by @ndrluis in #817
- Emit warnings instead of failing when seeing unsupported configuration by @Fokko in #1111
- Use
markdownlint
instead ofmdformat
by @kevinjqliu in #1118 - Add drop_view to the rest catalog by @ndrluis in #820
- Support python 3.12 by @kevinjqliu in #1068
- Make
commit_table
public by @Fokko in #1112 - Refactoring: Break down very large
table/__init__.py
module by @sungwy in #1144 - fix: Invert
case_sensitive
logic in StructType by @AnthonyLam in #1147 - Bump
duckdb
to version1.1.0
by @kevinjqliu in #1149 - Deprecate ADLFS prefix in favor of ADLS by @ndrluis in #961
- Cache Manifest files by @chinmay-bhat in #787
- Use the correct spec when rewiting existing manifests by @Fokko in #1157
- Bug Fix: Use historical partition field name by @sungwy in #1161
- fix: remove old, incorrect docstring by @dataders in #1166
- Preserve Backward compatibility in 0.8.0 for #1144 by @sungwy in #1151
- follow up for more cleanup by @dataders in #1168
- [bug] [REST] Dont remove identifier root by @kevinjqliu in #1172
- fix: support MonthTransform for partitioning by @felixscherz in #1176
- Add metadata tables for
data_files
anddelete_files
by @soumya-ghosh in #1066 - Use ArrowScan.to_table to replace project_table by @JE-Chen in #1180
- Add Docstrings to
pyiceberg/table/__init__.py
by @sungwy in #1189 - Support python 3.12 in poetry by @kevinjqliu in #1192
- Use
cachetools's LRUCache
to cache manifest list by @kevinjqliu in #1187 - HA HMS support by @awdavidson in #752
- Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large by @sungwy in #1141
- Remove dead loom link by @kevinjqliu in #1213
- Drop support for Python 3.8 by @raulcd in #1221
- Add clarifying docs to transform result types by @kevinzwang in #1211
- Add flag to allow disabling creation of catalog tables by @isc-patrick in #1155
- Bug Fix: Glue and Hive catalog return only Iceberg tables by @mark-major in #1145
- Move snapshot history expire table properties to constants by @ndrluis in #1217
- abort the whole table transaction if any updates in the transaction has failed by @stevie9868 in #1246
- PyArrow: Pass in null-mask by @Fokko in #1264
- Bump PyArrow to 18.0.0 by @Fokko in #1256
- Remove numpy as a hard dependency by @Fokko in #1270
- Allow for missing operation by @Fokko in #1263
- fix: list_tables method in glue catalog now only return tables. by @omkenge in #1258
- Replace
numpy
usage and remove frompyproject.toml
by @kevinjqliu in #1272 - Bump version to 0.8.0 by @Fokko in #1276
- Remove
initial_change
when CreateTableTransaction apply table updates on an empty metadata by @HonahX in #1219 - Deprecate for 0.8.0 release by @kevinjqliu in #1269
- Pass table-token to commit endpoint by @Fokko in #1278
- Updating configuration docs by @Samreay in #1292
- Allow union of
{int,long}
,{float,double}
, etc by @Fokko in #1283 - Allow passing in ARN Role and Session name to the
PyArrowFileIO
by @Fokko in #1...
pyiceberg-0.8.0rc2
What's Changed
PR
- Update PyIceberg Verify Release doc by @chinmay-bhat in #976
- DOCS: Add Github Actions Screenshots to Release Notes by @sungwy in #975
- Bump up version in dev Dockerfile and Issue Template by @ndrluis in #981
- Fix pydantic warning in the commit process by @ndrluis in #972
- Bump up Iceberg version to 1.6.0 by @ndrluis in #982
- Bug Fix: use appropriate partition spec for delete by @sungwy in #984
- [Bug Fix]Use
self.table_metadata
when in transaction by @HonahX in #985 - DOCS: Add more post release notes by @sungwy in #983
- Treat warning as error in CI/Dev by @ndrluis in #973
- Use 'strtobool' instead of comparing with a string. by @ndrluis in #988
- Fix: accept empty arrays in struct field lookup by @grobgl in #997
- Add ndrluis as collaborator by @sungwy in #1009
- Fix list namespace response in rest catalog by @ndrluis in #995
- Pyarrow IO property for configuring large v small types on read by @sungwy in #986
- Update metadata-log for non-rest catalogs by @soumya-ghosh in #977
- Exclude Python 3.9.7 due to import error in catalog module by @ndrluis in #526
- Deprecate rest.authorization-url in favor of oauth2-server-uri by @ndrluis in #962
- Allow setting
write.parquet.row-group-limit
by @Fokko in #1016 - Deprecate Redundant Identifier Support in TableIdentifier, and row_filter by @sungwy in #994
- Fix: Handle Empty RecordBatch within
_task_to_record_batches
, fix correctness issue with positional deletes by @sungwy in #1026 - Fix overwrite when filtering all the data by @ndrluis in #1023
- Allow setting
write.parquet.page-row-limit
by @Fokko in #1017 - DOCS: Remove older row for
write.parquet.row-group-limit
by @sungwy in #1030 - Improve test_version_format() error message for version mismatches by @laksh-krishna-sharma in #1015
- Bump version to 0.7.1 by @sungwy in #1034
- Support s3.signer.endpoint for nessie by @guitcastro in #1029
- [bug] fix reading with
to_arrow_batch_reader
andlimit
by @kevinjqliu in #1042 - Use
VisitorWithPartner
for name-mapping by @Fokko in #1014 - Fix tracing existing entries when there are deletes by @Fokko in #1046
- Coverage Run unit tests first before docker containers are set up by @Minfante377 in #1055
- Update "verify release" instruction by @kevinjqliu in #1064
- Fix Install Issues with
docutils = 0.21.post1
and exclude 3.12 from supported python dependencies by @sungwy in #1067 - Post Release 0.7.1 version updates by @sungwy in #1073
- Update create table doc to clarify ID re-assignment by @paulcichonski in #1072
- Refactor PyArrow DataFiles Projection functions by @sungwy in #1043
- DOCS: Exclude signature files from twine upload by @sungwy in #1071
- Increase the minimal required pyarrow version to 14.0.0 by @ndrluis in #1090
- Fix
table_exists
behavior in REST catalog by @ndrluis in #1096 - fix: improve makefile by @TiansuYu in #1091
- fix (issue-1079): allow update_column to set doc as '' by @TiansuYu in #1083
- prevent adding duplicate files by @amitgilad3 in #1036
- Add list_views to rest catalog by @ndrluis in #817
- Emit warnings instead of failing when seeing unsupported configuration by @Fokko in #1111
- Use
markdownlint
instead ofmdformat
by @kevinjqliu in #1118 - Add drop_view to the rest catalog by @ndrluis in #820
- Support python 3.12 by @kevinjqliu in #1068
- Make
commit_table
public by @Fokko in #1112 - Refactoring: Break down very large
table/__init__.py
module by @sungwy in #1144 - fix: Invert
case_sensitive
logic in StructType by @AnthonyLam in #1147 - Bump
duckdb
to version1.1.0
by @kevinjqliu in #1149 - Deprecate ADLFS prefix in favor of ADLS by @ndrluis in #961
- Cache Manifest files by @chinmay-bhat in #787
- Use the correct spec when rewiting existing manifests by @Fokko in #1157
- Bug Fix: Use historical partition field name by @sungwy in #1161
- fix: remove old, incorrect docstring by @dataders in #1166
- Preserve Backward compatibility in 0.8.0 for #1144 by @sungwy in #1151
- follow up for more cleanup by @dataders in #1168
- [bug] [REST] Dont remove identifier root by @kevinjqliu in #1172
- fix: support MonthTransform for partitioning by @felixscherz in #1176
- Add metadata tables for
data_files
anddelete_files
by @soumya-ghosh in #1066 - Use ArrowScan.to_table to replace project_table by @JE-Chen in #1180
- Add Docstrings to
pyiceberg/table/__init__.py
by @sungwy in #1189 - Support python 3.12 in poetry by @kevinjqliu in #1192
- Use
cachetools's LRUCache
to cache manifest list by @kevinjqliu in #1187 - HA HMS support by @awdavidson in #752
- Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large by @sungwy in #1141
- Remove dead loom link by @kevinjqliu in #1213
- Drop support for Python 3.8 by @raulcd in #1221
- Add clarifying docs to transform result types by @kevinzwang in #1211
- Add flag to allow disabling creation of catalog tables by @isc-patrick in #1155
- Bug Fix: Glue and Hive catalog return only Iceberg tables by @mark-major in #1145
- Move snapshot history expire table properties to constants by @ndrluis in #1217
- abort the whole table transaction if any updates in the transaction has failed by @stevie9868 in #1246
- PyArrow: Pass in null-mask by @Fokko in #1264
- Bump PyArrow to 18.0.0 by @Fokko in #1256
- Remove numpy as a hard dependency by @Fokko in #1270
- Allow for missing operation by @Fokko in #1263
- fix: list_tables method in glue catalog now only return tables. by @omkenge in #1258
- Replace
numpy
usage and remove frompyproject.toml
by @kevinjqliu in #1272 - Bump version to 0.8.0 by @Fokko in #1276
- Remove
initial_change
when CreateTableTransaction apply table updates on an empty metadata by @HonahX in #1219 - Deprecate for 0.8.0 release by @kevinjqliu in #1269
- Pass table-token to commit endpoint by @Fokko in #1278
- Updating configuration docs by @Samreay in #1292
- Allow union of
{int,long}
,{float,double}
, etc by @Fokko in #1283 - Allow passing in ARN Role and Session name to the
PyArrowFileIO
by @Fokko in #1...
pyiceberg-0.8.0-rc1
What's Changed
PRs
- Update PyIceberg Verify Release doc by @chinmay-bhat in #976
- DOCS: Add Github Actions Screenshots to Release Notes by @sungwy in #975
- Bump up version in dev Dockerfile and Issue Template by @ndrluis in #981
- Fix pydantic warning in the commit process by @ndrluis in #972
- Bump up Iceberg version to 1.6.0 by @ndrluis in #982
- Bug Fix: use appropriate partition spec for delete by @sungwy in #984
- [Bug Fix]Use
self.table_metadata
when in transaction by @HonahX in #985 - DOCS: Add more post release notes by @sungwy in #983
- Treat warning as error in CI/Dev by @ndrluis in #973
- Use 'strtobool' instead of comparing with a string. by @ndrluis in #988
- Fix: accept empty arrays in struct field lookup by @grobgl in #997
- Add ndrluis as collaborator by @sungwy in #1009
- Fix list namespace response in rest catalog by @ndrluis in #995
- Pyarrow IO property for configuring large v small types on read by @sungwy in #986
- Update metadata-log for non-rest catalogs by @soumya-ghosh in #977
- Exclude Python 3.9.7 due to import error in catalog module by @ndrluis in #526
- Deprecate rest.authorization-url in favor of oauth2-server-uri by @ndrluis in #962
- Allow setting
write.parquet.row-group-limit
by @Fokko in #1016 - Deprecate Redundant Identifier Support in TableIdentifier, and row_filter by @sungwy in #994
- Fix: Handle Empty RecordBatch within
_task_to_record_batches
, fix correctness issue with positional deletes by @sungwy in #1026 - Fix overwrite when filtering all the data by @ndrluis in #1023
- Allow setting
write.parquet.page-row-limit
by @Fokko in #1017 - DOCS: Remove older row for
write.parquet.row-group-limit
by @sungwy in #1030 - Improve test_version_format() error message for version mismatches by @laksh-krishna-sharma in #1015
- Bump version to 0.7.1 by @sungwy in #1034
- Support s3.signer.endpoint for nessie by @guitcastro in #1029
- [bug] fix reading with
to_arrow_batch_reader
andlimit
by @kevinjqliu in #1042 - Use
VisitorWithPartner
for name-mapping by @Fokko in #1014 - Fix tracing existing entries when there are deletes by @Fokko in #1046
- Coverage Run unit tests first before docker containers are set up by @Minfante377 in #1055
- Update "verify release" instruction by @kevinjqliu in #1064
- Fix Install Issues with
docutils = 0.21.post1
and exclude 3.12 from supported python dependencies by @sungwy in #1067 - Post Release 0.7.1 version updates by @sungwy in #1073
- Update create table doc to clarify ID re-assignment by @paulcichonski in #1072
- Refactor PyArrow DataFiles Projection functions by @sungwy in #1043
- DOCS: Exclude signature files from twine upload by @sungwy in #1071
- Increase the minimal required pyarrow version to 14.0.0 by @ndrluis in #1090
- Fix
table_exists
behavior in REST catalog by @ndrluis in #1096 - fix: improve makefile by @TiansuYu in #1091
- fix (issue-1079): allow update_column to set doc as '' by @TiansuYu in #1083
- prevent adding duplicate files by @amitgilad3 in #1036
- Add list_views to rest catalog by @ndrluis in #817
- Emit warnings instead of failing when seeing unsupported configuration by @Fokko in #1111
- Use
markdownlint
instead ofmdformat
by @kevinjqliu in #1118 - Add drop_view to the rest catalog by @ndrluis in #820
- Support python 3.12 by @kevinjqliu in #1068
- Make
commit_table
public by @Fokko in #1112 - Refactoring: Break down very large
table/__init__.py
module by @sungwy in #1144 - fix: Invert
case_sensitive
logic in StructType by @AnthonyLam in #1147 - Bump
duckdb
to version1.1.0
by @kevinjqliu in #1149 - Deprecate ADLFS prefix in favor of ADLS by @ndrluis in #961
- Cache Manifest files by @chinmay-bhat in #787
- Use the correct spec when rewiting existing manifests by @Fokko in #1157
- Bug Fix: Use historical partition field name by @sungwy in #1161
- fix: remove old, incorrect docstring by @dataders in #1166
- Preserve Backward compatibility in 0.8.0 for #1144 by @sungwy in #1151
- follow up for more cleanup by @dataders in #1168
- [bug] [REST] Dont remove identifier root by @kevinjqliu in #1172
- fix: support MonthTransform for partitioning by @felixscherz in #1176
- Add metadata tables for
data_files
anddelete_files
by @soumya-ghosh in #1066 - Use ArrowScan.to_table to replace project_table by @JE-Chen in #1180
- Add Docstrings to
pyiceberg/table/__init__.py
by @sungwy in #1189 - Support python 3.12 in poetry by @kevinjqliu in #1192
- Use
cachetools's LRUCache
to cache manifest list by @kevinjqliu in #1187 - HA HMS support by @awdavidson in #752
- Bug Fix: Position Deletes + row_filter yields less data when the DataFile is large by @sungwy in #1141
- Remove dead loom link by @kevinjqliu in #1213
- Drop support for Python 3.8 by @raulcd in #1221
- Add clarifying docs to transform result types by @kevinzwang in #1211
- Add flag to allow disabling creation of catalog tables by @isc-patrick in #1155
- Bug Fix: Glue and Hive catalog return only Iceberg tables by @mark-major in #1145
- Move snapshot history expire table properties to constants by @ndrluis in #1217
- abort the whole table transaction if any updates in the transaction has failed by @stevie9868 in #1246
- PyArrow: Pass in null-mask by @Fokko in #1264
- Bump PyArrow to 18.0.0 by @Fokko in #1256
- Remove numpy as a hard dependency by @Fokko in #1270
- Allow for missing operation by @Fokko in #1263
- fix: list_tables method in glue catalog now only return tables. by @omkenge in #1258
- Replace
numpy
usage and remove frompyproject.toml
by @kevinjqliu in #1272 - Bump version to 0.8.0 by @Fokko in #1276
- Remove
initial_change
when CreateTableTransaction apply table updates on an empty metadata by @HonahX in #1219 - Deprecate for 0.8.0 release by @kevinjqliu in #1269
- Pass table-token to commit endpoint by @Fokko in #1278
- Updating configuration docs by @Samreay in #1292
- Allow union of
{int,long}
,{float,double}
, etc by @Fokko in #1283 - Allow passing in ARN Role and Session name to the
PyArrowFileIO
by @Fokko in https://github.com/apache/iceberg-python/pull/...
pyiceberg-0.7.1
What's Changed
- Fix
delete
to trace existing manifests when a data file is partially rewritten by @Fokko in #1046 - Fix 'to_arrow_batch_reader' to respect the limit input arg by @kevinjqliu in #1042
- Fix correctness of applying positional deletes on Merge-On-Read tables by @sungwy in #1026
- Fix overwrite when filtering data by @ndrluis in #1023
- Bug fix for deletes across multiple partition specs on partition evolution by @sungwy in #984
- Fix evolving the table and writing in the same transaction by @HonahX in #985
- Fix scans when result is empty by @grobgl in #997
- Fix ListNamespace response in REST Catalog by @ndrluis in #995
- Exclude Python 3.9.7 from list of supported versions by @ndrluis in #526
- Allow setting write.parquet.row-group-limit by @Fokko in #1016
- Allow setting write.parquet.page-row-limit by @Fokko in #1017
- Fix pydantic warning during commit by @ndrluis in #972
Full Changelog: pyiceberg-0.7.0...pyiceberg-0.7.1
pyiceberg-0.7.0
What's Changed
- Build: Bump getdaft from 0.2.14 to 0.2.15 by @dependabot in #434
- Build: Bump cryptography from 42.0.0 to 42.0.2 by @dependabot in #440
- docs: Add missing release steps by @Fokko in #443
- Build: Bump moto from 5.0.1 to 5.0.2 by @dependabot in #447
- Build: Bump mkdocs-material from 9.5.9 to 9.5.10 by @dependabot in #448
- Make the snapshot creation part of the
Transaction
by @Fokko in #446 - Send X-Iceberg-Access-Delegation header to signal support for vended credentials/remote signing by @nastra in #436
- Retry with new Access Token on 419 response by @anupam-saini in #340
- Reuse commit-uuid as the write-uuid by @Fokko in #437
- Update NameMapping on update_schema() by @sungwy in #441
- Feat: Implement
create_table_if_not_exists
by @hussein-awala in #415 - Build: Bump coverage from 7.4.1 to 7.4.2 by @dependabot in #457
- Build: Bump getdaft from 0.2.15 to 0.2.16 by @dependabot in #456
- Accept pyarrow LargeListType and FixedSizeListType by @hussein-awala in #458
- Bump pre-commit and such by @Fokko in #442
- docstring: Fix missing commit by @Fokko in #432
- Improve error message in case of a mismatch by @Fokko in #352
- Cleanup conftest, remove LocalOutputFile by @kevinjqliu in #468
- Fix
InMemoryCatalog
Catalog commit operation by @anupam-saini in #470 - enable set hadoop ugi for hive catalog by @j7nhai in #472
- Raise exception if namespace does not exist in load_namespace_properties for Sql Catalog by @rushilshah1 in #477
- Add Support for Custom Header Configurations in RESTCatalog by @geruh in #467
- rest: Set OAuth Content-Type header explicitly by @Fokko in #478
- Partition Evolution by @amogh-jahagirdar in #245
- Fix retrying logic by @Fokko in #480
- Remove unused catalog from integration test by @kevinjqliu in #481
- add github add to check md link by @kevinjqliu in #324
- Sort Order update by @anupam-saini in #476
- Make issued_token_type optional to support OAuth2 Client Credential Flow by @flyrain in #466
- Update table metadata throughout transaction by @Fokko in #471
- Allow non-string typed values in table properties by @kevinjqliu in #469
- Construction of filenames for partitioned writes by @jqin61 in #453
- Remove extraneous import by @Fokko in #485
- Default spark session timezone to UTC in test by @kevinjqliu in #494
- Fix dead links in docs by @kevinjqliu in #493
- Update bug isse template release list by @ndrluis in #496
- add rest scope in the config documentation by @himadripal in #495
- Make scope configurable by @himadripal in #484
- Tests should explicitly check for
schema_id
by @kevinjqliu in #487 - add support for glue.id by @jrouly in #490
- [Bug Fix] cast None
current-snapshot-id
as -1 for Backwards Compatibility by @sungwy in #473 - Make optional oauth configurable by @himadripal in #486
- Disable Spark Catalog caching for integration tests by @kevinjqliu in #501
- Set table properties with dictionary by @kevinjqliu in #503
- Imports decouple by @ndrluis in #505
- Allow setting non-string typed values in
set_properties
by @kevinjqliu in #504 - [Bug fix] update name mapping in Transaction.update_schema by @sungwy in #508
- [Bug Fix] Allow Partition data to be nullable in ManifestEntry by @sungwy in #509
- Allow fsspec up to 2025.1 by @bolkedebruin in #510
- Build: Bump pypa/cibuildwheel from 2.16.5 to 2.17.0 by @dependabot in #517
- Decouple imports reported by mypy linter by @ndrluis in #519
- build: Move back to the mmh3 by @Fokko in #460
- Improve the InMemory Catalog Implementation by @kevinjqliu in #289
- Add
table_exists
method to Catalog by @anupam-saini in #512 - Add StrictMetricsEvaluator by @Fokko in #518
- Add Data Files from Parquet Files to UnPartitioned Table by @sungwy in #506
- Fix CommitTableRequest serialisation by @kdbhiggins in #525
- Add partition stats in snapshot summary by @jqin61 in #521
- UUID literal to binary and fixed by @sebpretzer in #529
- Adding a new dev dep,
deptry
by @kevinjqliu in #528 - Add as_arrow() to Schema class by @ndrluis in #532
- Change Append/Overwrite API to accept snapshot properties by @Gowthami03B in #419
- Fix Glue Integration test by @HonahX in #536
- Add Snapshots table metadata by @Fokko in #524
add_files
support partitioned tables by @sungwy in #531- [Bug Fix] Fix TableMetadataV1 Validators by @HonahX in #544
- Fix race condition on
Table.scan
withlimit
by @kevinjqliu in #545 - Add Strict projection by @Fokko in #539
- Fix the Avro tests by @Fokko in #552
- On write operation, cast data to Iceberg Table's pyarrow schema by @kevinjqliu in #523
- Bin-pack Writes Operation into multiple parquet files, and parallelize writing
WriteTask
s by @kevinjqliu in #444 - Bump version to 0.6.1 by @HonahX in #561
- Minor fixes, #523 followup by @kevinjqliu in #563
- Call as_arrow() call in
overwrite
by @kevinjqliu in #565 - Remove
as visitors
import by @Fokko in #567 - Tests: Make Spark optional for testing by @Fokko in #568
- [CI FIx] Use Docker Compose V2 by @HonahX in #575
- typealias for table version by @MehulBatra in #566
- Disallow default header to be overwritten by @whynick1 in #577
- [Doc] Update how-to-release.md by @HonahX in #576
- Support CreateTableTransaction in Glue and Rest by @HonahX in #498
- Move writes to Transaction by @sungwy in #571
- Add entries metadata table by @Fokko in #551
- Partitioned Append on Identity Transform by @jqin61 in #555
- Implement getstate and setstate on PyArrowFileIO and FsSpecFileIO so that they can be pickled by @amogh-jahagirdar in #543
- [Bug Fix] Allow HiveCatalog to create table with TimestamptzType by @HonahX in #585
- Change DataScan to accept Metadata and io by @Fokko in #581
- Read: fetch file_schema directly from pyarrow_to_schema by @HonahX in #597
- Support Time Travel in InspectTable.entries by @sungwy in #599
- [Bug Fix] HiveCatalog's _commit_table need to refresh and update the metadata in a ...
PyIceberg 0.6.1
Patch release:
- Fail to create version 1 table with non-empty partition-spec and sort-order
- Hive Catalog cannot create table with TimestamptzType field
- Fail to read parquet file with special characters in column names
- Hive Catalog commit consistency issue
- docutils=0.21 installation issue
Full Changelog: https://github.com/apache/iceberg-python/commits/pyiceberg-0.6.1
PyIceberg 0.6.0
What's Changed
- Python: Migrate from
iceberg
toiceberg-python
by @Fokko in #3 - Build: Bump duckdb from 0.8.1 to 0.9.0 by @dependabot in #4
- Build: Bump mkdocs-section-index from 0.3.7 to 0.3.8 by @dependabot in #5
- Build: Bump mkdocstrings-python from 1.7.0 to 1.7.1 by @dependabot in #6
- Build: Bump pydantic from 2.3.0 to 2.4.2 by @dependabot in #7
- Build: Bump psycopg2-binary from 2.9.7 to 2.9.8 by @dependabot in #8
- Build: Bump moto from 4.2.4 to 4.2.5 by @dependabot in #9
- Build: Bump mkdocs-material from 9.4.1 to 9.4.2 by @dependabot in #10
- Build: Bump rich from 13.5.3 to 13.6.0 by @dependabot in #11
- Build: Bump typing-extensions from 4.7.1 to 4.8.0 by @dependabot in #12
- Build: Bump griffe from 0.36.2 to 0.36.4 by @dependabot in #13
- Build: Bump urllib3 from 1.26.16 to 1.26.17 by @dependabot in #36
- Update how to release by @Fokko in #34
- pydantic exclude 2.4.0, 2.4.1 by @syun64 in #38
- Add logic to generate a new snapshot-id by @Fokko in #37
- Fix the TableIdentifier by @Fokko in #44
- Convert the Logical to Physical map to a visitor by @Fokko in #43
- Build: Bump mkdocstrings-python from 1.7.1 to 1.7.2 by @dependabot in #52
- Build: Bump fastavro from 1.8.3 to 1.8.4 by @dependabot in #51
- Build: Bump pypa/cibuildwheel from 2.16.0 to 2.16.2 by @dependabot in #47
- Build: Bump psycopg2-binary from 2.9.8 to 2.9.9 by @dependabot in #49
- Build: Bump coverage from 7.3.1 to 7.3.2 by @dependabot in #50
- Build: Bump cython from 3.0.2 to 3.0.3 by @dependabot in #48
- Docs: Fix repo name and url by @manuzhang in #54
- Run integration tests with Iceberg 1.4.0 by @Fokko in #56
- Add logic for table format-version updates by @Fokko in #55
- Disable merge-commit and enforce linear history by @Fokko in #57
- Construct a writer tree by @Fokko in #40
- Add method and property around sequence-numbers by @Fokko in #60
- Fix column rename doc example to reflect correct API by @cabhishek in #59
- Expression: Part of the expression is ignored when multiple and/or expressions are specified by @amogh-jahagirdar in #65
- Fix Iceberg to Avro Schema Conversion: Fixed, Decimal, UUID by @HonahX in #53
- allow override env-variables in load_catalog by @bdilday in #45
- Make
next_sequence_number
private by @Fokko in #62 - Check for empty responses by @Fokko in #69
- Fix Arrow fixed type by @Fokko in #70
- Bump version to 0.5.1 by @Fokko in #68
- Add
spec_id
back to data file by @puchengy in #63 - Build: Bump ray from 2.7.0 to 2.7.1 by @dependabot in #77
- Build: Bump griffe from 0.36.4 to 0.36.5 by @dependabot in #76
- Build: Bump mypy-boto3-glue from 1.28.36 to 1.28.63 by @dependabot in #75
- Build: Bump mkdocstrings-python from 1.7.2 to 1.7.3 by @dependabot in #74
- Build: Bump moto from 4.2.5 to 4.2.6 by @dependabot in #73
- Remove python working directory by @Fokko in #71
- Don't fail on warning when releasing by @Fokko in #80
- Remove
example
since it is deprecated by @Fokko in #79 - Build: Bump urllib3 from 1.26.17 to 1.26.18 by @dependabot in #84
- Doc: Fix "Verifying Checksums" script in verify-release.md by @HonahX in #82
- Make to_arrow function capable of handling parquet files with sanitized name due to Avro restirction by @puchengy in #83
- Require full expression parse match by @danielcweeks in #88
- Fix NotStartsWith negation by @danielcweeks in #92
- Fix some broken commands and URLs in the docs by @hussein-awala in #89
- Update like statements to reflect sql behaviors by @danielcweeks in #91
- Fix equality of bound expressions by @Fokko in #95
- Build: Bump mkdocs-material from 9.4.2 to 9.4.6 by @dependabot in #100
- Build: Bump pytest-mock from 3.11.1 to 3.12.0 by @dependabot in #99
- Build: Bump sqlalchemy from 2.0.21 to 2.0.22 by @dependabot in #98
- Build: Bump griffe from 0.36.5 to 0.36.7 by @dependabot in #97
- Build: Bump adlfs from 2023.9.0 to 2023.10.0 by @dependabot in #96
- Replace old
%-formatted
byf-strings
by @hussein-awala in #93 - Fix literal predicate equality check by @danielcweeks in #94
- Fix the nullability of
snapshot-id
onAssertRefSnapshotId
by @Fokko in #103 - Build: Bump werkzeug from 2.3.7 to 3.0.1 by @dependabot in #105
- Api docs refactor by @mobley-trent in #106
- Fixed typos by @whisk in #108
- Build: Bump duckdb from 0.9.0 to 0.9.1 by @dependabot in #114
- Build: Bump pre-commit from 3.4.0 to 3.5.0 by @dependabot in #113
- Build: Bump mkdocs-material from 9.4.6 to 9.4.7 by @dependabot in #111
- Build: Bump pytest from 7.4.2 to 7.4.3 by @dependabot in #112
- Build: Bump moto from 4.2.6 to 4.2.7 by @dependabot in #110
- fix: partition evaluator thread safety by @skellys in #115
- Run dependabot daily by @Fokko in #66
- Build: Bump griffe from 0.36.7 to 0.36.9 by @dependabot in #118
- Build: Bump cython from 3.0.3 to 3.0.5 by @dependabot in #122
- Build: Bump sqlalchemy from 2.0.22 to 2.0.23 by @dependabot in #125
- Build: Bump zstandard from 0.21.0 to 0.22.0 by @dependabot in #120
- Build: Bump fastavro from 1.8.4 to 1.9.0 by @dependabot in #119
- Refactor Arrow schema conversion by @Fokko in #117
- Build: Bump pyarrow from 13.0.0 to 14.0.0 by @Fokko in #126
- Build: Bump mkdocs-material-extensions from 1.2 to 1.3 by @dependabot in #128
- Add flake8-pie to ruff by @Fokko in #86
- Update pre-commit by @Fokko in #85
- Bump version to 0.6.0 by @Fokko in #72
- Build: Bump mypy-boto3-glue from 1.28.63 to 1.28.77 by @dependabot in #130
- Catch warning in PyLint tests by @Fokko in #33
- Build: Bump mkdocs-material from 9.4.7 to 9.4.8 by @dependabot in #131
- Fix Github Pages path by @Fokko in #133
- Build: Bump pyarrow from 14.0.0 to 14.0.1 by @dependabot in #136
- Add list-refs cli command by @amogh-jahagirdar in #137
- Docs: Add section on pandas by @Fokko in #138
- Build: Bump mkdocstrings-python from 1.7.3 to 1.7.4 by @dependabot in #142
...