AWS Data Wrangler 0.2.1
Enhancements
- Support for empty dataframe for Pandas.read_sql_athena(ctas_approach=True)
- Cleaning temp S3 files for Pandas.read_sql_athena(ctas_approach=True)
- Inverting file format and file compression extensions (key suffix) (Hadoop/Spark/Hive compatibility)
- Aurora ingestion revisited
- Bumping dependencies version
- Add Pandas.read_csv_prefix()
- Improve Athena._normalize_name() rules
- Improving autocomplete support
- Simplifying everything on Sagemaker
- Adding Glue.get_connection()
- Adapt read_sql_athena(ctas_approach=True) for eventual consistency caveats.
Bugfixes
- Fixing bug to fetch Glue tables comments
- Fixing Spark for default Session
Docs
- Add athena_nested.ipynb tutorial
- Add catalog_and_metadata.ipynb tutorial
P.S. Lambda Layer's bundle and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. Have you never used Layers? Check the step-by-step guide.
P.P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).