Docker Image with latest Tesseract OCR Version 5.x.x built from sources.
The sources are pulled from the latest main branch and latest releases of the Tesseract OCR project.
Docker Hub: https://hub.docker.com/r/franky1/tesseract
Pull the docker image from Docker Hub:
docker pull franky1/tesseractMount your image data to the /tmp directory and run Tesseract OCR container with the required command line options, for example, run Tesseract OCR container with test image:
docker run -it -v ${PWD}/testdata:/tmp --rm franky1/tesseract \
tesseract english.png output --oem 1 -l engFor the Tesseract command line options, please refer to the Tesseract Manual
Test if the mounted languages from your local subfolder /tessdata are available in the Docker container.
Be aware that the local languages overwrite the installed languages in the Docker image. Example here with french language:
docker run -it -v ${PWD}/testdata:/tmp \
-v ${PWD}/tessdata:/usr/local/share/tessdata/ \
--rm franky1/tesseractTest the mounted languages in the Docker container with a sample image. Example here with french language:
docker run -it -v ${PWD}/testdata:/tmp \
-v ${PWD}/tessdata:/usr/local/share/tessdata/ \
--rm franky1/tesseract \
tesseract french.jpg output --oem 1 -l fraAlternatively, you can build a new Docker image if you want other languages, see next section.
For details have a look into the Dockerfile.
- Git clone this repo.
- Add your required languages to the languages.txt file.
- Build the docker image.
- To build with the
mainbranch of Tesseract:docker build --progress=plain --tag tesseract . - To build with a specific
releaseversion of Tesseract:docker build --progress=plain --tag tesseract --build-arg TESSERACT_VERSION=5.0.0 .
- To build with the
- Run Tesseract OCR container with test image:
docker run -it --name tesseract -v ${PWD}/testdata:/tmp --rm \
tesseract tesseract english.png output --oem 1 -l eng- Only supported target for this docker image currently is
linux/amd64. - Working directory for ocr images is
/tmpinside the container. See example above. - Directory for trained data is
/usr/local/share/tessdata/inside the container. See example above. - This image was built without the Tesseract training tools.
- This image currently includes only the following languages:
- English:
tessdata_best > eng.traineddata - German:
tessdata_best > deu.traineddata - If you need other languages, you have to build your own image or mount trained data to the
/usr/local/share/tessdata/directory. See example above.
- English:
- Overview of supported languages https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html
- Trained models with support for legacy and LSTM OCR engine https://github.com/tesseract-ocr/tessdata
- Fast integer versions of trained LSTM models https://github.com/tesseract-ocr/tessdata_fast
- Best (most accurate) trained LSTM models https://github.com/tesseract-ocr/tessdata_best
- Docker Hub: https://hub.docker.com/repository/docker/franky1/tesseract
- Original Tesseract Github Repository: https://github.com/tesseract-ocr/tesseract
- Original Tesseract Documentation: https://tesseract-ocr.github.io/
- Original Tesseract Manual: https://tesseract-ocr.github.io/tessdoc/
- More
tessdata_bestlanguages: https://github.com/tesseract-ocr/tessdata_best
- Update
README.mdto latest Dockerfile and Usage - Add dependabot on Github
- Add vulnerability scanning in Github Actions with Snyk
- Use multi-stage build in Dockerfile and clone from git
- Add GitHub Action for check container efficiency with Dive https://github.com/MartinHeinz/dive-action
- Add documentation for GitHub Actions Workflow
- Add more inline comments in GitHub Actions related files
- Build image for more targets
- Building Tesseract with TensorFlow?
- Building Tesseract with Training tools?
If you have any bugs or requests regarding this Docker image, please post an issue in this Github Repository.
11.09.2025: Docker Image is ready for usage, still some slight improvements possible, sometimes build issues