Releases: aws-samples/amazon-textract-textractor
Releases · aws-samples/amazon-textract-textractor
Version 1.7.1
What's Changed
- Fix issue where a table within a container layout could be duplicated in the
.get_text()output.
Full Changelog: v1.7.0...v1.7.1
Version 1.7.0
What's Changed
- Loosen XlsxWriter version constraints by @mdscruggs in #292
- Rework the linearization heuristic to ensure that no words are missing or duplicated
- Fix KeyValues being assigned twice on overlapping table cells, going forward KVs inside a tables are ignored (table structure takes precedence)
- Hardens parser code against missing children in layouts or KeyValues with missing keys
- Fix markdown tables not having header rows when one of the cell is empty
- Add support for Python 3.11 and 3.12 in the GitHub action workflows
- Add
textractor.__version__to allow easier identification of the installed Textractor version in code - Added hide_table_layout
- Remove amazon-textract-response-parser as a dependency as its use for validating the input schema could add +200 ms of latency in some cases. Textractor-only parsing takes <30ms.
Breaking changes
- Remove
linearize_tableandlinearize_key_valuefromTextLinearizationConfigas both were not used - Remove the
s3_output_pathparameter fromanalyze_expenseas the API does not support outputting to S3
New Contributors
- @mdscruggs made their first contribution in #292
Full Changelog: v1.6.1...v1.7.0
Version 1.6.1
Version 1.6.0
Version 1.5.0
What's Changed
- Add GetResult from S3 in LazyDocument
- Add more linearization formatting options
- Fix exception thrown when a CHILD relationships maps to a non-existent LINE
Full Changelog: v1.4.5...v1.5.0
Version 1.4.5
What's Changed
Full Changelog: v1.4.4...v1.4.5
Version 1.4.4
What's Changed
Full Changelog: v1.4.3...v1.4.4
Version 1.4.3
Version 1.4.2
What's Changed
Full Changelog: v1.4.1...v.1.4.2
Version 1.4.1
What's Changed
- Fix signature token not being added to the linearized text
- Fix but where empty pages raise an exception when linearized
Full Changelog: v1.4.0...v1.4.1