Skip to content

Releases: aws/aws-sdk-pandas

AWS Data Wrangler 2.13.0

03 Dec 20:09
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 6 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Breaking changes

  • Fix sanitize methods to align with Glue/Hive naming conventions #579

New Functionalities

  • AWS Lake Formation Governed Tables 🚀 #570
  • Support for Python 3.10 🔥 #973
  • Add partitioning to JSON datasets #962
  • Add ability to use unbuffered cursor for large MySQL datasets #928

Enhancements

  • Add awswrangler.s3.list_buckets #997
  • Add partitions_parameters to catalog partitions methods #1035
  • Refactor pagination config in list objects #955
  • Add error message to EmptyDataframe exception #991

Documentation

  • Clarify docs & add tutorial on schema evolution for CSV datasets #964

Bug Fix

  • catalog.add_column() without column_comment triggers exception #1017
  • catalog.create_parquet_table Key in dictionary does not always exist #998
  • Fix Catalog StorageDescriptor get #969

Thanks

We thank the following contributors/users for their work on this release:

@csabz09, @Falydoor, @moritzkoerber, @maxispeicher, @kukushking, @jaidisido


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.12.1

18 Oct 12:02
829c306
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Patch

  • Removing unnecessary dev dependencies from main #961

P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.12.0

13 Oct 16:32
f82b7e1
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

Enhancements

  • redshift.read_sql_query - handle empty table corner case #874
  • Refactor read parquet table to reduce file list scan based on available partitions #878
  • Shrink lambda layer with strip command #884
  • Enabling DynamoDB endpoint URL #887
  • EMR jobs concurrency #889
  • Add feature to allow custom AMI for EMR #907
  • wr.redshift.unload_to_files empty the S3 folder instead of overwriting existing files #914
  • Add catalog_id arg to wr.catalog.does_table_exist #920
  • Ad enpoint_url for AWS Secrets Manager #929

Documentation

  • Update docs for awswrangler.s3.to_csv #868

Bug Fix

  • wr.mysql.to_sql with use_column_names=True when column names are reserved words #918

Thanks

We thank the following contributors/users for their work on this release:

@AssafMentzer, @mureddy19, @isichei, @DonnaArt, @kukushking, @jaidisido


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.11.0

01 Sep 16:49
e216d53
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 5 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

New Functionalities

  • Redshift and RDS Data Api Support #828 🚀 Check out the tutorial. Many thanks to @pwithams for this contribution

Enhancements

  • Upgrade to PyArrow 5 #861
  • Add Pagination for TimestreamDB #838

Documentation

  • Clarifying structure of SSM secrets in connect methods #871

Bug Fix

  • Use botocores' Loader and ServiceModel to extract accepted kwargs #832

Thanks

We thank the following contributors/users for their work on this release:

@pwithams, @maxispeicher, @kukushking, @jaidisido


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.10.0

21 Jul 11:35
db1e3ef
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Enhancements

  • Add upsert support for Postgresql #807
  • Add schema evolution parameter to wr.s3.to_csv #787
  • Enable order by in CTAS Athena queries #785
  • Add header to wr.s3.to_csv when dataset = True #765
  • Add CSV as unload format to wr.redshift.unload_files #761

Bug Fix

  • Fix deleting CTAS temporary Glue tables #782
  • Ensure safe get of Glue table parameters #779 and #783

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @jaidisido, @mohdaliiqbal


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.9.0

18 Jun 13:15
89b459d
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Enhancements

  • Enable server-side predicate filtering using S3 Select 🚀 #678
  • Support VersionId parameter for S3 read operations #721
  • Enable prefix in output S3 files for wr.redshift.unload_to_files #729
  • Add option to skip commit on wr.redshift.to_sql #705
  • Move integration test infrastructure to CDK 🎉 #706

Bug Fix

  • Wait until athena query results bucket is created #735
  • Remove explicit Excel engine configuration #742
  • Fix bucketing types #719
  • Change end_time to UTC #720

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @jaidisido


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.8.0

19 May 13:40
b13fcd8
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 4 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Install Lambda Layers and Python wheels from public S3 bucket 🎉 #666
  • Clarified docs around potential in-place mutation of dataframe when using to_parquet #669

Enhancements

  • Enable parallel s3 downloads (~20% speedup) 🚀 #644
  • Apache Arrow 4.0.0 support (enables ARM instances support as well) #557
  • Enable LOCK before concurrent COPY calls in Redshift #665
  • Make use of Pyarrow iter_batches (>= 3.0.0 only) #660
  • Enable additional options when overwriting Redshift table (drop, truncate, cascade) #671
  • Reuse s3 client across threads for s3 range requests #684

Bug Fix

  • Add dtypes for empty ctas athena queries #659
  • Add Serde properties when creating CSV table #672
  • Pass SSL properties from Glue Connection to MySQL #554

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @kukushking, @igorborgest, @gballardin, @eferm, @jaklan, @Falydoor, @chariottrider, @chriscugliotta, @konradsemsch, @gvermillion, @russellbrooks, @mshober.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run or use them from our S3 public bucket!

AWS Data Wrangler 2.7.0

15 Apr 17:17
fd1b62f
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

  • Updated documentation to clarify wr.athena.read_sql_query params argument use #609

New Functionalities

  • Supporting MySQL upserts #608
  • Enable prepending S3 parquet files with a prefix in wr.s3.write.to_parquet #617
  • Add exist_ok flag to safely create a Glue database #642
  • Add "Unsupported Pyarrow type" exception #639

Bug Fix

  • Fix chunked mode in wr.s3.read_parquet_table #627
  • Fix missing \ character from wr.s3.read_parquet_table method #638
  • Support postgres as an engine value #630
  • Add default workgroup result configuration #633
  • Raise exception when merge_upsert_table fails or data_quality is insufficient #601
  • Fixing nested structure bug in athena2pyarrow method #612

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @mattboyd-aws, @vlieven, @bentkibler, @adarsh-chauhan, @impredicative, @nmduarteus, @JoshCrosby, @TakumiHaruta, @zdk123, @tuannguyen0901, @jiteshsoni, @luminita.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.6.0

16 Mar 18:50
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Enhancements

  • Added a chunksize parameter to the to_sql function. Default set to 200. Decreased insertion time from 120 to 1 second #599
  • path argument is now optional in s3.to_parquet and s3.to_csv functions #586
  • Added a map_types boolean (set to True by default) to convert pyarrow DataTypes to pandas ExtensionDtypes #580
  • Added optional ctas_database_name argument to store ctas_temporary_table in an alternative database #576

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @igorborgest, @ilyanoskov, @VashMKS, @jmahlik, @dimapod, @Reeska


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!

AWS Data Wrangler 2.5.0

03 Mar 16:59
Compare
Choose a tag to compare

Caveats

⚠️ For platforms without PyArrow 3 support (e.g. MWAA, EMR, Glue PySpark Job):

➡️ pip install pyarrow==2 awswrangler

Documentation

Enhancements

  • Support for ExpectedBucketOwner #562

Thanks

We thank the following contributors/users for their work on this release:

@maxispeicher, @impredicative, @adarsh-chauhan, @Malkard.


P.S. The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. Just upload it and run!