Releases: aws/aws-sdk-pandas
AWS Data Wrangler 2.13.0
Caveats
⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Breaking changes
- Fix sanitize methods to align with Glue/Hive naming conventions #579
New Functionalities
- AWS Lake Formation Governed Tables 🚀 #570
- Support for Python 3.10 🔥 #973
- Add partitioning to JSON datasets #962
- Add ability to use unbuffered cursor for large MySQL datasets #928
Enhancements
- Add awswrangler.s3.list_buckets #997
- Add partitions_parameters to catalog partitions methods #1035
- Refactor pagination config in list objects #955
- Add error message to EmptyDataframe exception #991
Documentation
- Clarify docs & add tutorial on schema evolution for CSV datasets #964
Bug Fix
- catalog.add_column() without column_comment triggers exception #1017
- catalog.create_parquet_table Key in dictionary does not always exist #998
- Fix Catalog StorageDescriptor get #969
Thanks
We thank the following contributors/users for their work on this release:
@csabz09, @Falydoor, @moritzkoerber, @maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.12.1
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Patch
- Removing unnecessary dev dependencies from main #961
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.12.0
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Add Support for Opensearch #891 🔥 Check out the tutorial. Many thanks to @AssafMentzer and @mureddy19 for this contribution
Enhancements
- redshift.read_sql_query - handle empty table corner case #874
- Refactor read parquet table to reduce file list scan based on available partitions #878
- Shrink lambda layer with strip command #884
- Enabling DynamoDB endpoint URL #887
- EMR jobs concurrency #889
- Add feature to allow custom AMI for EMR #907
- wr.redshift.unload_to_files empty the S3 folder instead of overwriting existing files #914
- Add catalog_id arg to wr.catalog.does_table_exist #920
- Ad enpoint_url for AWS Secrets Manager #929
Documentation
- Update docs for awswrangler.s3.to_csv #868
Bug Fix
- wr.mysql.to_sql with use_column_names=True when column names are reserved words #918
Thanks
We thank the following contributors/users for their work on this release:
@AssafMentzer, @mureddy19, @isichei, @DonnaArt, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.11.0
Caveats
⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
New Functionalities
- Redshift and RDS Data Api Support #828 🚀 Check out the tutorial. Many thanks to @pwithams for this contribution
Enhancements
Documentation
- Clarifying structure of SSM secrets in
connect
methods #871
Bug Fix
- Use botocores' Loader and ServiceModel to extract accepted kwargs #832
Thanks
We thank the following contributors/users for their work on this release:
@pwithams, @maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.10.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Enhancements
- Add upsert support for Postgresql #807
- Add schema evolution parameter to
wr.s3.to_csv
#787 - Enable order by in CTAS Athena queries #785
- Add header to
wr.s3.to_csv
when dataset = True #765 - Add
CSV
as unload format towr.redshift.unload_files
#761
Bug Fix
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @jaidisido, @mohdaliiqbal
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.9.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Added S3 Select tutorial #748
- Clarified wr.s3.to_csv docs #730
Enhancements
- Enable server-side predicate filtering using
S3 Select
🚀 #678 - Support
VersionId
parameter for S3 read operations #721 - Enable prefix in output S3 files for
wr.redshift.unload_to_files
#729 - Add option to skip commit on
wr.redshift.to_sql
#705 - Move integration test infrastructure to CDK 🎉 #706
Bug Fix
- Wait until athena query results bucket is created #735
- Remove explicit Excel engine configuration #742
- Fix bucketing types #719
- Change end_time to UTC #720
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @jaidisido
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.8.0
Caveats
⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
- Clarified docs around potential in-place mutation of dataframe when using
to_parquet
#669
Enhancements
- Enable parallel s3 downloads (~20% speedup) 🚀 #644
- Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
- Enable
LOCK
before concurrentCOPY
calls in Redshift #665 - Make use of Pyarrow
iter_batches
(>= 3.0.0 only) #660 - Enable additional options when overwriting Redshift table (
drop
,truncate
,cascade
) #671 - Reuse s3 client across threads for s3 range requests #684
Bug Fix
- Add
dtypes
for empty ctas athena queries #659 - Add Serde properties when creating CSV table #672
- Pass SSL properties from Glue Connection to MySQL #554
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!
AWS Data Wrangler 2.7.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- Updated documentation to clarify
wr.athena.read_sql_query
params argument use #609
New Functionalities
- Supporting MySQL upserts #608
- Enable prepending S3 parquet files with a prefix in
wr.s3.write.to_parquet
#617 - Add
exist_ok
flag to safely create a Glue database #642 - Add "Unsupported Pyarrow type" exception #639
Bug Fix
- Fix
chunked
mode inwr.s3.read_parquet_table
#627 - Fix missing
\
character fromwr.s3.read_parquet_table
method #638 - Support
postgres
as an engine value #630 - Add default workgroup result configuration #633
- Raise exception when
merge_upsert_table
fails or data_quality is insufficient #601 - Fixing nested structure bug in
athena2pyarrow
method #612
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.6.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Enhancements
- Added a
chunksize
parameter to theto_sql
function. Default set to 200. Decreased insertion time from 120 to 1 second #599 path
argument is now optional ins3.to_parquet
ands3.to_csv
functions #586- Added a
map_types
boolean (set to True by default) to convert pyarrow DataTypes to pandas ExtensionDtypes #580 - Added optional
ctas_database_name
argument to storectas_temporary_table
in an alternative database #576
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @igorborgest, @ilyanoskov, @VashMKS, @jmahlik, @dimapod, @Reeska
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!
AWS Data Wrangler 2.5.0
Caveats
⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):
➡️pip install pyarrow==2 awswrangler
Documentation
- New HTML tutorials #551
- Use bump2version for changing version numbers #573
- Mishandling of wildcard characters in read_parquet #564
Enhancements
- Support for
ExpectedBucketOwner
#562
Thanks
We thank the following contributors/users for their work on this release:
@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.
P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!