Skip to content

Releases: aws-samples/amazon-textract-textractor

Version 1.7.1

31 Jan 21:33

Choose a tag to compare

What's Changed

  • Fix issue where a table within a container layout could be duplicated in the .get_text() output.

Full Changelog: v1.7.0...v1.7.1

Version 1.7.0

31 Jan 00:13

Choose a tag to compare

What's Changed

  • Loosen XlsxWriter version constraints by @mdscruggs in #292
  • Rework the linearization heuristic to ensure that no words are missing or duplicated
  • Fix KeyValues being assigned twice on overlapping table cells, going forward KVs inside a tables are ignored (table structure takes precedence)
  • Hardens parser code against missing children in layouts or KeyValues with missing keys
  • Fix markdown tables not having header rows when one of the cell is empty
  • Add support for Python 3.11 and 3.12 in the GitHub action workflows
  • Add textractor.__version__ to allow easier identification of the installed Textractor version in code
  • Added hide_table_layout
  • Remove amazon-textract-response-parser as a dependency as its use for validating the input schema could add +200 ms of latency in some cases. Textractor-only parsing takes <30ms.

Breaking changes

  • Remove linearize_table and linearize_key_value from TextLinearizationConfig as both were not used
  • Remove the s3_output_path parameter from analyze_expense as the API does not support outputting to S3

New Contributors

Full Changelog: v1.6.1...v1.7.0

Version 1.6.1

19 Dec 21:03

Choose a tag to compare

What's new

  • Fix bug in table to markdown

Full Changelog: v1.6.0...v1.6.1

Version 1.6.0

19 Dec 20:51

Choose a tag to compare

What's Changed

Full Changelog: v1.5.0...v1.6.0

Version 1.5.0

12 Dec 16:57

Choose a tag to compare

What's Changed

  • Add GetResult from S3 in LazyDocument
  • Add more linearization formatting options
  • Fix exception thrown when a CHILD relationships maps to a non-existent LINE

Full Changelog: v1.4.5...v1.5.0

Version 1.4.5

02 Nov 14:48
9fc92f8

Choose a tag to compare

What's Changed

  • Fix missing words in get_text_and_words by @Belval in #270

Full Changelog: v1.4.4...v1.4.5

Version 1.4.4

01 Nov 15:36
8c06481

Choose a tag to compare

What's Changed

  • Add page_layout property to Page object by @Belval in #268

Full Changelog: v1.4.3...v1.4.4

Version 1.4.3

30 Oct 20:46
c559e1a

Choose a tag to compare

What's Changed

  • Raise exception on export-to-markdown without pandas by @Belval in #261
  • Add page number prefix and suffix to linearization config by @Belval in #266
  • Fix table post-processing by @Belval in #267

Full Changelog: v.1.4.2...v1.4.3

Version 1.4.2

23 Oct 18:27
fd911f2

Choose a tag to compare

What's Changed

Full Changelog: v1.4.1...v.1.4.2

Version 1.4.1

19 Oct 22:16

Choose a tag to compare

What's Changed

  • Fix signature token not being added to the linearized text
  • Fix but where empty pages raise an exception when linearized

Full Changelog: v1.4.0...v1.4.1