Releases · scribeocr/scribe.js · GitHub

29 Aug 07:56

Balearica

v0.2.4

Improved support with build tools such as Webpack
Fixed bug where PDF resources were being loaded when not necessary (dd99124)
Fixed Tesseract bug causing incorrect metrics for single-word recognition (Recognize Word) in Scribe OCR UI (f6be561)

Full Changelog: v0.2.3...v0.2.4

Assets 2

22 Aug 00:54

Balearica

v0.2.3

Added extractPDFTextImage option to importFiles
- When extractPDFTextNative, extractPDFTextOCR, and extractPDFTextImage are all set to true, text will always be extracted from the input PDF and set as the "active" version, even if there is no text.

Full Changelog: v0.2.2...v0.2.3

Assets 2

21 Aug 05:36

Balearica

v0.2.2

Added support for importing HOCR generated by Tesseract.js

Full Changelog: v0.2.1...v0.2.2

Assets 2

20 Aug 03:14

Balearica

v0.2.1

Fixed bug where comparing OCR data required providing input images
Switched to using scoped repos for dependencies (@scribe.js/tesseract.js and @scribe.js/tesseract.js-core) to fix name conflicts
Other minor changes

Full Changelog: v0.2.0...v0.2.1

Assets 2

17 Aug 21:59

Balearica

v0.2.0

Added extractInternalPDFText function for extracting existing text from PDFs.
Replaced recognizeFiles with extractText function.
- This function now skips recognition by default for text-native PDF inputs, which should not require OCR.
- The new name is intended to communicate that recognition is not run for all inputs.

Full Changelog: v0.1.1...v0.2.0

Assets 2

16 Aug 07:14

Balearica

v0.1.1

Initial public version of Scribe.js package (scribe.js-ocr on npm).

Assets 2