I use this project to learn new technologies related to spring boot web. I change this project constantly improving and adding new plugins, click here to follow up.
I needed to transform an image with a bill number code to a text in order to use it on other software. Yes, Google has/had a service for that, but you have to pay :(
This project converts images with text into simple text using tesseract and exposes it as a web service using spring boot.
Basically, the caller sends a request (post) with an image with text, and the project tries to find the text on this image, returns it as text, and stores the conversion on a database.
Basic overall architecture:
For more information plese check the project development site.
- Git
- Java
- Maven
- Spring Frameworks (boot, data, security, Open API, etc.)
- Tesseract
- Docker
These are the requirements:
- Git
# check the git version
git --version
- Java version >= 17
# check the Java version
java --version
- Ant version >= 1.10 (optional)
# check the Ant version
ant -version
- Maven version >= 3.8.8
# check the Maven version
mvn --version
- Docker
# check the Docker version
docker --version
- Newman (for tests)
# check the Newman version
newman --version
To execute, please folow these steps:
This project use the allset-java parent maven project to use plugins and configurations.
Please get this project and install it on your repository before continuing.
This project use the default-extensions to use plugins configurations (checkstyle-checks.xml, pmd-ruleset.xml, spotbugs-excludes.xml, etc).
Please get this project and install it on your repository before continuing.
To start, clone it:
git clone https://github.com/fernando-romulo-silva/image-converter-service
You have to be in the project's root directory:
cd image-converter-service
Build the application:
mvn package -DskipTests
Using Docker
It is recommended to use this process because using docker you don't need to install and configure tesseract on your pc:
docker build --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') \
--file src/main/docker/Dockerfile \
--tag image-converter-service .
To run the project:
docker run --publish 8080:8080 \
--publish 8000:8000 \
--detach \
--memory 1Gb \
--memory-reservation 256Mb \
--name image-converter-service-1 \
--env-file src/main/docker/AlpineVersion.env \
--log-opt mode=non-blocking \
--log-opt max-buffer-size=16m \
image-converter-service
Using Java Local
Tesseract needs a dictionary and the application uses the English dictionary called 'eng.traineddata.' For example, Ubuntu Ubuntu Linux (22.04.2 LTS) and tesseract 4, the default dictionary is installed on /usr/share/tesseract-ocr/4.00/tessdata/ and Alpine linux (3.15.0) and tesseract 4, the default dictionary is installed on /usr/share/tessdata/
You have to check where the dictionary was installed on your S.O.
First, define where the dictionary folder was installed:
export TESSERACT_FOLDER=/usr/share/tesseract-ocr/4.00/tessdata/
Next check if tesseract is working:
tesseract --version
Then execute:
mvn spring-boot:run -Dspring.profiles.active=local
There is a Postman collection that you can use to test it, but you can use Newman. To do that, execute the following command inside the project folder to test:
newman run src/test/resources/postman/image-converter-service.postman_collection.json \
-e src/test/resources/postman/image-converter-service-local.postman_environment.json
To access the API's documentation:
http://localhost:8080/swagger-ui/index.html?configUrl=/v3/api-docs/swagger-config