Releases: aws/aws-sdk-pandas
AWS Data Wrangler 1.6.2
Enhancements
- Now casting columns before append on an existing table only if necessary (
wr.s3.to_parquet()
). - Add retry mechanism for InternalError on s3 object deletion.
- Add handling of immutable numpy arrays. (
flag.writeable==False
)
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.1
Enhancements
- Casting support for any column type to string using
dtype
argument onwr.s3.to_parquet()
Bug Fix
- General bugs related to Athena Cache. 🐞
Docs
- General small updates.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.0
New Functionalities
- Amazon Athena CACHE 🚀 #285
- Initial AWS STS module
Enhancements
- Numpy 1.19.0
- Add
auto_create
anddb_groups
arguments toget_redshift_temp_engine
#288 - Add
validate_schema
arguments towr.s3.read_parquet_table
- Add
safe
argument toread_parquet
#296 - Refactor naming of pandas kwargs #291
- Allow providing suffix to s3.store_parquet_metadata #295
- Add
last_modified_begin
andlast_modified_begin
tolist_objects
,read_csv
,read_json
,read_fwf
andread_parquet
Bug Fix
- Fix bug on
get_table_description
on tables w/o description #294
Docs
- Add Athena cache tutorial.
Thanks
We thank the following contributors/users for their work on this release:
@koiker, @patrick-muller, @flaviomax, @acere, @jarretg, @bryanyang0528, @schrobot, @kinghuang, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.5.0
New Functionalities
- Amazon QuickSight support! 🎉
- Add create/delete database on wr.glue
Enhancements
- General improvements in the tutorials
- New Amazon S3 path check
- Add
sanitize_columns
arg for s3.to_parquet and s3.to_csv #278 #279 - Remove memory copy of DataFrame for to_parquet and to_csv
Bug Fix
- Force index=False for wr.db.to_sql() with redshift
Thanks
We thank the following contributors/users for their work on this release:
@ywang103, @patrick-muller, @tuliocasagrande, @sarojdongol, @sdknij, @ilyanoskov, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.4.0
New Functionalities
- Add support for reading CSV, JSON and FWF partitions. #265
Enhancements
- General improvement of moto tests
Bug Fix
- Fix
encoding
arg support for reading CSV, JSON and FWF. #271
Thanks
We thank the following contributors/users for their work on this release:
@bryanyang0528, @dwbelliston, @patrick-muller, @sdknij, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.3.0
New Functionalities
- Support for Athena Partition Projection [TUTORIAL]
Enhancements
Bug Fix
- Fix
dtype
(cast) onwr.s3.to_parquet
with nested types #263 - Fix EMR utilities for others region different than
us-east-1
#252 - Fix
wr.s3.to_parquet
for partitions in reverse order #264
Thanks
We thank the following contributors/users for their work on this release:
@bryanyang0528, @zachmoshe, @buseynehannes, @jiajie999, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.2.0
New Functionalities
- Infer mixed Parquet schemas on wr.s3.read_parquet_metadata and wr.s3.store_parquet_metadata #195
- Support to add new columns on wr.s3.to_parquet and wr.s3.store_parquet_metadata [TUTORIAL] #232
Enhancements
- Now wr.s3.delete_objects raises exception for not deleted objects #237
- User-friendly exceptions on wr.athena.read_sql_query and wr.athena.read_sql_table #239
Bug Fix
- Fix issue to use wr.s3.store_parquet_metadata on non-partitioned datasets #231
- Fix bug on wr.s3.read_json using chunksize #235
s3fs
version bumped #236- wr.s3.to_parquet single file does not sanitize column names fixed #240
Thanks
We thank the following contributors/users for their work on this release:
@mrshu, @bryanyang0528, @JPFrancoia, @jaidisido, @qemtek, @dwbelliston, @mbiemann, @parasml, @BrainMonkey, @hyperloglog, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.1.2
New Functionalities
- Add support for
uint8
,uint16
,uint32
anduint64
on Parquet. #76 - Add
get_table_parameters
,upsert_table_parameters
andupsert_table_parameters
onwr.catalog
. #224
Enhancements
- Add readahead
cache
fors3fs
.
Bug Fix
- Fixing type hints for sortkey. #226
- Fix
s3.to_parquet
overwriting with different partition schema.
Thanks
We thank the following contributors/users for their work on this release:
@robertaves ,@jar-no1, @JPFrancoia, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.1.1
Bug Fix
- Removing objects ending with "/" from
wr.s3.list_objects()
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.1.0
New Functionalities
- Support for nested arrays and structs on
wr.s3.to_parquet()
#206 - Support for Read Parquet/Athena/Redshift chunked by number of rows #192
- Add
custom_classifications
towr.emr.create_cluster()
#193 - Support for Docker on EMR #193
- Add
kms_key_id
,max_file_size
,region
arguments towr.db.unload_redshift()
#197 - Add
catalog_versioning
argument towr.s3.to_csv()
andwr.s3.to_parquet()
#198 - Add
keep_files
andctas_temp_table_name
arguments towr.athena.read_sql_*()
#203 - Add
replace_filenames
argument towr.s3.copy_objects()
#215
Enhancements
wr.s3.to_csv()
andwr.s3.to_parquet()
no longer need delete table permission to overwrite catalog table #198- Added support for UUID on
wr.db.read_sql_query()
(PostgreSQL) #200 - Refactoring of Athena encryption and workgroup support #212
Bug Fix
- Support for read full NULL columns from PostgreSQL, MySQL, and Redshift #218
Thanks
We thank the following contributors/users for their work on this release:
@robkano ,@luigift, @parasml, @OElesin, @jar-no1, @keatmin, @pmleveque, @sapientderek, @jadayn, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).