Skip to content

Commit 8c21766

Browse files
authored
Merge pull request #32 from Topdu/openocr_svtrv2
[Features] add Openocrv1 and svtrv2
2 parents 61716c2 + 5dcdf8f commit 8c21766

12 files changed

Lines changed: 2503 additions & 260 deletions

File tree

README.md

Lines changed: 103 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,34 @@
11
# OpenOCR
22

3-
OpenOCR aims to establish a unified training and evaluation benchmark for scene text detection and recognition algorithms, at the same time, serves as the official code repository for the OCR team from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University.
4-
5-
We are actively developing and refining it and expect to release the first version as soon as possible.
3+
We aim to establishing a unified benchmark for training and evaluating models for scene text detection and recognition. Based on this benchmark, we introduce an accurate and efficient general OCR system, OpenOCR. Additionally, this repository will serve as the official codebase for the OCR team from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University.
64

75
We sincerely welcome the researcher to recommend OCR or relevant algorithms and point out any potential factual errors or bugs. Upon receiving the suggestions, we will promptly evaluate and critically reproduce them. We look forward to collaborating with you to advance the development of OpenOCR and continuously contribute to the OCR community!
86

7+
## Features
8+
9+
- 🔥**OpenOCR: A general OCR system for accuracy and efficiency**
10+
-\[[Quick Start](#quick-start)\] \[[Demo](<>)(TODO)\]
11+
- [Introduction](./docs/openocr.md)
12+
- A practical version of the model builds on SVTRv2.
13+
- Outperforming [PP-OCRv4](<>) released by [PaddleOCR](<>) by 4.5% on the [OCR competition leaderboard](<>).
14+
- [x] Supporting Chinese and English text detection and recognition.
15+
- [x] Providing server model and mobile model.
16+
- [ ] Fine-tuning OpenOCR on a custom dataset
17+
- [ ] Export to ONNX engine
18+
- 🔥**SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition**
19+
- \[[Paper](../configs/rec/svtrv2/SVTRv2.pdf)\] \[[Model](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[Config, Training and Inference](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\]
20+
- [Introduction](./docs/svtrv2.md)
21+
- Developing a unified training and evaluation benchmark for Scene Text Recognition
22+
- Supporting for 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
23+
- Improving results by 20-30% compared to training on synthetic datasets.
24+
- Towards Arbitrary-Shaped Text Recognition and Language modeling with a Single Visual Model.
25+
- Surpasses Attention-based Decoder Methods across challenging scenarios in terms of accuracy and speed
26+
- [Get Started](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch) with training a SoTA Scene Text Recognition model from scratch.
27+
928
## Ours STR algorithms
1029

1130
- [**DPTR**](<>) (*Shuai Zhao, Yongkun Du, Zhineng Chen\*, Yu-Gang Jiang. Decoder Pre-Training with only Text for Scene Text Recognition,* ACM MM 2024. [paper](https://arxiv.org/abs/2408.05706))
12-
- [**IGTR**](./configs/rec/igtr/) (*Yongkun Du, Zhineng Chen\*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition,* Under TPAMI minor revision 2024. [Doc](./configs/rec/igtr/readme.md), [paper](https://arxiv.org/abs/2401.17851))
31+
- [**IGTR**](./configs/rec/igtr/) (*Yongkun Du, Zhineng Chen\*, Yuchen Su, Caiyan Jia, Yu-Gang Jiang. Instruction-Guided Scene Text Recognition,* Under TPAMI minor revison 2024. [Doc](./configs/rec/igtr/readme.md), [paper](https://arxiv.org/abs/2401.17851))
1332
- [**SVTRv2**](./configs/rec/svtrv2) (*Yongkun Du, Zhineng Chen\*, Hongtao Xie, Caiyan Jia, Yu-Gang Jiang. SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition,* 2024. [paper](./configs/rec/svtrv2/SVTRv2.pdf))
1433
- [**SMTR&FocalSVTR**](./configs/rec/smtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xieping Gao, Yu-Gang Jiang. Out of Length Text Recognition with Sub-String Matching,* 2024. [paper](https://arxiv.org/abs/2407.12317))
1534
- [**CDistNet**](./configs/rec/cdistnet/) (*Tianlun Zheng, Zhineng Chen\*, Shancheng Fang, Hongtao Xie, Yu-Gang Jiang. CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition,* IJCV 2024. [paper](https://link.springer.com/article/10.1007/s11263-023-01880-0))
@@ -19,9 +38,78 @@ We sincerely welcome the researcher to recommend OCR or relevant algorithms and
1938
- [**SVTR**](./configs/rec/svtr/) (*Yongkun Du, Zhineng Chen\*, Caiyan Jia, Xiaoting Yin, Tianlun Zheng, Chenxia Li, Yuning Du, Yu-Gang Jiang. SVTR: Scene Text Recognition with a Single Visual Model,* IJCAI 2022 (Long). [PaddleOCR Doc](https://github.com/Topdu/PaddleOCR/blob/main/doc/doc_ch/algorithm_rec_svtr.md), [paper](https://www.ijcai.org/proceedings/2022/124))
2039
- [**NRTR**](./configs/rec/nrtr/) (*Fenfen Sheng, Zhineng Chen\*, Bo Xu. NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition,* ICDAR 2019. [paper](https://arxiv.org/abs/1806.00926))
2140

22-
## STR
41+
## Recent Updates
42+
43+
- **🔥 2024.11.23 release notes**:
44+
- **OpenOCR: A general OCR system for accuracy and efficiency**
45+
-\[[Quick Start](#quick-start)\] \[[Demo](<>)(TODO)\]
46+
- [Introduction](./docs/openocr.md)
47+
- **SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition**
48+
- \[[Paper](../configs/rec/svtrv2/SVTRv2.pdf)\] \[[Model](./configs/rec/svtrv2/readme.md#11-models-and-results)\] \[[Config, Training and Inference](./configs/rec/svtrv2/readme.md#3-model-training--evaluation)\]
49+
- [Introduction](./docs/svtrv2.md)
50+
- [Get Started](./docs/svtrv2.md#get-started-with-training-a-sota-scene-text-recognition-model-from-scratch) with training a SoTA Scene Text Recognition model from scratch.
51+
52+
## [Quick Start](./docs/openocr.md#quick-start)
53+
54+
#### Dependencies:
55+
56+
- [PyTorch](http://pytorch.org/) version >= 1.13.0
57+
- Python version >= 3.7
58+
59+
```shell
60+
conda create -n openocr python==3.8
61+
conda activate openocr
62+
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=11.8 -c pytorch -c nvidia
63+
```
64+
65+
After installing dependencies, the following two installation methods are available. Either one can be chosen.
66+
67+
#### 1. Python Modules
68+
69+
```shell
70+
pip install openocr-python
71+
```
72+
73+
**Usage**:
74+
75+
```python
76+
from openocr import OpenOCR
77+
78+
engine = OpenOCR()
2379

24-
Reproduction schedule:
80+
img_path = '/path/img_path or /path/img_file'
81+
result, elapse = engine(img_path)
82+
print(result)
83+
print(elapse)
84+
85+
# Server mode
86+
engine = OpenOCR(mode='server')
87+
```
88+
89+
#### 2. Clone this repository:
90+
91+
```shell
92+
git clone https://github.com/Topdu/OpenOCR.git
93+
cd OpenOCR
94+
pip install -r requirements.txt
95+
```
96+
97+
**Usage**:
98+
99+
```shell
100+
# OpenOCR system: Det + Rec model
101+
python tools/infer_e2e.py --img_path=/path/img_fold or /path/img_file
102+
103+
# Det model
104+
python tools/infer_det.py --c ./configs/det/dbnet/repvit_db.yml --o Global.infer_img=/path/img_fold or /path/img_file
105+
106+
# Rec model
107+
python tools/infer_rec.py --c ./configs/rec/svtrv2/repsvtr_ch.yml --o Global.infer_img=/path/img_fold or /path/img_file
108+
```
109+
110+
## Reproduction schedule:
111+
112+
### Scene Text Recognition
25113

26114
| Method | Venue | Training | Evaluation | Contributor |
27115
| --------------------------------------------- | ---------------------------------------------------------------------------------------------- | -------- | ---------- | ------------------------------------------- |
@@ -56,21 +144,25 @@ Reproduction schedule:
56144
| [IGTR](./configs/rec/igtr/) | [2024](https://arxiv.org/abs/2401.17851) ||| |
57145
| [SMTR](./configs/rec/smtr/) | [2024](https://arxiv.org/abs/2407.12317) ||| |
58146
| [FocalSVTR-CTC](./configs/rec/svtrs/) | [2024](https://arxiv.org/abs/2407.12317) ||| |
59-
| [SVTRv2](./configs/rec/svtrv2/) | 2024 ||| |
147+
| [SVTRv2](./configs/rec/svtrv2/) | [2024](./configs/rec/svtrv2/SVTRv2.pdf) ||| |
60148
| [ResNet+Trans-CTC](./configs/rec/svtrs/) | ||| |
61149
| [ViT-CTC](./configs/rec/svtrs/) | ||| |
62150

63-
### Contributors
151+
#### Contributors
64152

65153
______________________________________________________________________
66154

67155
Yiming Lei ([pretto0](https://github.com/pretto0)) and Xingsong Ye ([YesianRohn](https://github.com/YesianRohn)) from the [FVL](https://fvl.fudan.edu.cn) Laboratory, Fudan University, under the guidance of Professor Zhineng Chen, completed the majority of the algorithm reproduction work. Grateful for their outstanding contributions.
68156

69-
______________________________________________________________________
157+
### Scene Text Detection (STD)
70158

71-
## STD
159+
TODO
72160

73-
## E2E
161+
### Text Spotting
162+
163+
TODO
164+
165+
______________________________________________________________________
74166

75167
# Acknowledgement
76168

configs/det/dbnet/repvit_db.yml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Global:
1010
- 1000
1111
cal_metric_during_train: false
1212
checkpoints:
13-
pretrained_model: paddle_to_openocr_det_repvit_ch.pth
13+
pretrained_model: openocr_det_repvit_ch.pth
1414
save_inference_dir: null
1515
use_visualdl: false
1616
infer_img: ./testA
@@ -53,9 +53,10 @@ Architecture:
5353
PostProcess:
5454
name: DBPostProcess
5555
thresh: 0.3
56-
box_thresh: 0.6
56+
box_thresh: 0.4
5757
max_candidates: 1000
5858
unclip_ratio: 1.5
59+
score_mode: 'slow'
5960

6061
# Metric:
6162
# name: DetMetric
@@ -144,8 +145,8 @@ Eval:
144145
# image_shape: [1280, 1280]
145146
# keep_ratio: True
146147
# padding: True
147-
# limit_side_len: 1280
148-
# limit_type: max
148+
limit_side_len: 960
149+
limit_type: max
149150
- NormalizeImage:
150151
scale: 1./255.
151152
mean:

0 commit comments

Comments
 (0)