Releases: aws/aws-sdk-pandas
AWS Data Wrangler 0.2.1
Enhancements
- Support for empty dataframe for Pandas.read_sql_athena(ctas_approach=True)
- Cleaning temp S3 files for Pandas.read_sql_athena(ctas_approach=True)
- Inverting file format and file compression extensions (key suffix) (Hadoop/Spark/Hive compatibility)
- Aurora ingestion revisited
- Bumping dependencies version
- Add Pandas.read_csv_prefix()
- Improve Athena._normalize_name() rules
- Improving autocomplete support
- Simplifying everything on Sagemaker
- Adding Glue.get_connection()
- Adapt read_sql_athena(ctas_approach=True) for eventual consistency caveats.
Bugfixes
- Fixing bug to fetch Glue tables comments
- Fixing Spark for default Session
Docs
- Add athena_nested.ipynb tutorial
- Add catalog_and_metadata.ipynb tutorial
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.2.0
Enhancements
- Add description, parameters and column's comments as arguments to all methods that creates any Glue tables (METADATA).
- Add several methods to explore the Glue Catalog.
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.1.4
Enhancements
- Pandas -> Aurora (MySQL/PostgreSQL) (Append/Overwrite) (Via S3)
- Aurora -> Pandas (MySQL) (Via S3)
- Aurora -> CSV (S3) (MySQL)
- Smaller lambda layers
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 0.1.3
Bugfixes
- Fix Default Session bug for environments without credentials
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
AWS Data Wrangler 0.1.1
Enhancements
- Pandas to Redshift with upsert mode
- Load SageMaker Job outputs
- Default Session
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
AWS Data Wrangler 0.1.0
Enhancements
- Read Parquet tables from Glue Catalog directly to Pandas DataFrame
- Read Athena's results to Pandas DataFrame via CTAS (Blazing fast 🚀)
- Redshift's results to S3 as Parquet
- Read Redshift's results to Pandas DataFrame via Parquet export (Blazing fast 🚀)
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
AWS Data Wrangler 0.0.25
Enhancements
- Read parquet data from s3 directly to Pandas DataFrame #73
Bugfixes
- Fix Pandas.read_sql_athena() usage with the Session() default s3_output
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
AWS Data Wrangler 0.0.24
Enhancements
- Add support for Decimal data type #58
- Add more Athena's settings in Session() (defaults)
- Add PyArrow's toggle option for EMR.create_cluster()
Bugfixes
- Fix Pandas.read_sql_athena() issues with arrays data types #72
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. It's just upload and run!
AWS Data Wrangler 0.0.23
Enhancements
- Improving cast for date columns
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. It's just upload and run!
AWS Data Wrangler 0.0.22
Bugfixes
- Setting null date values as None for pandas.read_sql_athena() #69
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. It's just upload and run!