-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
511 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# OCR | ||
|
||
|
||
## easy ocr | ||
|
||
check lang codes at https://www.jaided.ai/easyocr/ | ||
|
||
<div class="load_as_code_session" data-url="easy_ocr.py"> | ||
Loading content... | ||
</div> | ||
|
||
|
||
|
||
## other OCR with GPU | ||
|
||
If you're looking for open-source AI-based OCR solutions that can leverage your NVIDIA GPU and process Traditional Chinese (zh-TW), here are some excellent options: | ||
|
||
--- | ||
|
||
### 1. **Tesseract OCR with GPU Support** | ||
- **Description**: Tesseract is a well-established open-source OCR engine that supports multiple languages, including Traditional Chinese. However, it doesn’t natively support GPU acceleration, but you can pair it with pre-processing tools like OpenCV or other AI models to boost performance. | ||
- **Key Features**: | ||
- High customization and language support (including Traditional Chinese). | ||
- Works well for clean, printed text. | ||
- **Limitations**: | ||
- Relatively slow compared to modern AI-based OCR solutions. | ||
- **Setup**: | ||
- Install `tesseract-ocr` and the Traditional Chinese language data package (`chi_tra`). | ||
- Can be used with Python via the `pytesseract` library. | ||
- **GPU Option**: | ||
- Pre-process images using GPU-accelerated libraries like OpenCV with CUDA. | ||
|
||
--- | ||
|
||
### 2. **EasyOCR** | ||
- **Description**: EasyOCR is a modern, AI-powered OCR library written in PyTorch. It supports GPU acceleration out of the box and handles Traditional Chinese well. | ||
- **Key Features**: | ||
- Multilingual support, including zh-TW. | ||
- Lightweight and easy to set up. | ||
- Can leverage NVIDIA GPUs for faster processing. | ||
- **Setup**: | ||
1. Install via pip: `pip install easyocr`. | ||
2. Run the code: | ||
```python | ||
import easyocr | ||
reader = easyocr.Reader(['zh-tw'], gpu=True) | ||
result = reader.readtext('path_to_image') | ||
``` | ||
- **Limitations**: | ||
- Struggles with very complex or heavily distorted handwriting. | ||
|
||
--- | ||
|
||
### 3. **PaddleOCR** | ||
- **Description**: PaddleOCR is a powerful OCR tool developed by Baidu. It supports GPU acceleration using NVIDIA GPUs and provides excellent accuracy, especially for Chinese text. | ||
- **Key Features**: | ||
- Optimized for Chinese languages. | ||
- High accuracy for both printed and handwritten text. | ||
- Built-in tools for image pre-processing and text detection. | ||
- **Setup**: | ||
1. Install the PaddleOCR package: | ||
```bash | ||
pip install paddleocr | ||
pip install paddlepaddle-gpu # Ensure GPU support | ||
``` | ||
2. Use the library: | ||
```python | ||
from paddleocr import PaddleOCR | ||
ocr = PaddleOCR(use_gpu=True, lang='ch') | ||
result = ocr.ocr('path_to_image', cls=True) | ||
``` | ||
- **Limitations**: | ||
- Requires installing PaddlePaddle, which can have specific system requirements. | ||
|
||
--- | ||
|
||
### 4. **OCR with OpenCV and Deep Learning Models** | ||
- **Description**: OpenCV allows integration with custom deep learning OCR models like CRNN (Convolutional Recurrent Neural Network) or SAR (Sequence-to-Sequence Attention-based OCR). These models can be trained or fine-tuned on Traditional Chinese datasets. | ||
- **Key Features**: | ||
- Customizable for your specific needs. | ||
- Full GPU acceleration using NVIDIA CUDA. | ||
- **Setup**: | ||
- Use OpenCV with CUDA for pre-processing (e.g., noise removal, binarization). | ||
- Combine with a deep learning framework (e.g., PyTorch or TensorFlow) for OCR. | ||
|
||
--- | ||
|
||
### 5. **TrOCR by Microsoft** | ||
- **Description**: TrOCR is a transformer-based OCR model provided by Microsoft. It supports multilingual text recognition, including Chinese, and works efficiently with GPU acceleration. | ||
- **Key Features**: | ||
- State-of-the-art accuracy. | ||
- Uses transformers for improved contextual understanding. | ||
- **Setup**: | ||
1. Install the `transformers` library: | ||
```bash | ||
pip install transformers | ||
``` | ||
2. Use the model: | ||
```python | ||
from transformers import TrOCRProcessor, VisionEncoderDecoderModel | ||
from PIL import Image | ||
import torch | ||
|
||
processor = TrOCRProcessor.from_pretrained("microsoft/trocr-base-handwritten") | ||
model = VisionEncoderDecoderModel.from_pretrained("microsoft/trocr-base-handwritten").cuda() | ||
|
||
image = Image.open('path_to_image').convert("RGB") | ||
pixel_values = processor(images=image, return_tensors="pt").pixel_values.cuda() | ||
generated_ids = model.generate(pixel_values) | ||
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] | ||
print(text) | ||
``` | ||
- **Limitations**: | ||
- Requires fine-tuning for best performance on Traditional Chinese. | ||
|
||
--- | ||
|
||
|
||
|
||
<script src="https://posetmage.com/assets/js/LoadAsCodeSession.js"></script> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
import easyocr | ||
import torch | ||
|
||
# Function to check if GPU is available | ||
def check_gpu(): | ||
if torch.cuda.is_available(): | ||
print("GPU is available and will be used.") | ||
else: | ||
print("GPU is not available. Using CPU.") | ||
|
||
# Check GPU availability | ||
check_gpu() | ||
|
||
# Initialize EasyOCR reader for Traditional Chinese (zh-tw) | ||
reader = easyocr.Reader(['ch_tra', 'en'], gpu=True) # Set gpu=True to ensure it uses GPU | ||
|
||
# Loop through image files from 001 to 274 | ||
for i in range(1, 275): # Loop from 1 to 274 | ||
# Format the image file name | ||
image_file = f'output-{i:03}.png' # This formats numbers with leading zeros (e.g., 001, 002, ..., 274) | ||
|
||
try: | ||
# Perform OCR on the image | ||
result = reader.readtext(image_file) | ||
|
||
# Create corresponding .txt file name | ||
output_file = image_file.replace('.png', '.txt') # Replace .png with .txt | ||
|
||
# Save the recognized text to a .txt file | ||
with open(output_file, 'w', encoding='utf-8') as f: | ||
for detection in result: | ||
text = detection[1] # The recognized text | ||
f.write(text + '\n') # Write text to file, each on a new line | ||
|
||
print(f'Text from {image_file} saved to {output_file}') | ||
|
||
except Exception as e: | ||
print(f"Error processing {image_file}: {e}") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# PDF2Image | ||
|
||
## PDF2Image by python | ||
|
||
<div class="load_as_code_session" data-url="pdf2img.py"> | ||
Loading content... | ||
</div> | ||
|
||
|
||
## PDF2Image by CLI | ||
|
||
To convert each page of a PDF into separate image files using a CLI (Command Line Interface) tool, you can use **`pdftoppm`**, part of the `poppler-utils` package, or **`ImageMagick`**. Here are solutions using both: | ||
|
||
--- | ||
|
||
### **Option 1: Using `pdftoppm`** | ||
1. **Install `poppler-utils`** (if not installed): | ||
- On Debian/Ubuntu: | ||
```bash | ||
sudo apt update | ||
sudo apt install poppler-utils | ||
``` | ||
- On macOS (via Homebrew): | ||
```bash | ||
brew install poppler | ||
``` | ||
|
||
2. **Convert PDF to Images**: | ||
```bash | ||
pdftoppm -png input.pdf output | ||
``` | ||
- `-png`: Sets the output format to PNG (use `-jpeg` for JPEG). | ||
- `input.pdf`: The input PDF file. | ||
- `output`: The prefix for output image files (e.g., `output-1.png`, `output-2.png`). | ||
|
||
--- | ||
|
||
### **Option 2: Using ImageMagick** | ||
1. **Install ImageMagick**: | ||
- On Debian/Ubuntu: | ||
```bash | ||
sudo apt update | ||
sudo apt install imagemagick | ||
``` | ||
- On macOS (via Homebrew): | ||
```bash | ||
brew install imagemagick | ||
``` | ||
|
||
2. **Convert PDF to Images**: | ||
```bash | ||
convert -density 300 input.pdf page-%03d.png | ||
``` | ||
- `-density 300`: Sets resolution to 300 DPI (higher values produce better quality images). | ||
- `input.pdf`: The input PDF file. | ||
- `page-%03d.png`: Output filenames with a three-digit page number (e.g., `page-001.png`, `page-002.png`). | ||
|
||
--- | ||
|
||
### **Advanced Options** | ||
- To extract specific pages with `pdftoppm`, use the `-f` (from) and `-l` (last) flags: | ||
```bash | ||
pdftoppm -png -f 2 -l 5 input.pdf output | ||
``` | ||
This converts pages 2 to 5 only. | ||
|
||
- To customize image size or quality in `ImageMagick`: | ||
```bash | ||
convert -density 300 -quality 90 input.pdf page-%03d.png | ||
``` | ||
- `-quality 90`: Sets the compression quality for JPEG/PNG output. | ||
|
||
Both tools are efficient and widely available on Linux, macOS, and Windows (via WSL or binaries). Let me know if you need further assistance! | ||
|
||
|
||
|
||
<script src="https://posetmage.com/assets/js/LoadAsCodeSession.js"></script> |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# PDF2Text | ||
|
||
|
||
## PDF to Images | ||
|
||
see [PDF2Image](./PDF2Image/) | ||
|
||
|
||
## Images to Text | ||
|
||
see [OCR](./OCR/) | ||
|
||
|
||
## View in browser | ||
|
||
After convert images to text, you can use this file to see left side is image and right side is text | ||
|
||
|
||
<div class="load_as_code_session" data-url="browse.html"> | ||
Loading content... | ||
</div> | ||
|
||
|
||
|
||
<script src="https://posetmage.com/assets/js/LoadAsCodeSession.js"></script> |
Oops, something went wrong.