【开源实习】基于MindSpore NLP实现DeepSeek-OCR文本识别与结构化解析可交互DEMO开发 #2064#51
Open
lyyyym wants to merge 5 commits intomindspore-lab:devfrom
Open
【开源实习】基于MindSpore NLP实现DeepSeek-OCR文本识别与结构化解析可交互DEMO开发 #2064#51lyyyym wants to merge 5 commits intomindspore-lab:devfrom
lyyyym wants to merge 5 commits intomindspore-lab:devfrom
Conversation
Contributor
|
辛苦 @moyu026 做下代码验证,@DuangZ-GR 检查下代码规范 |
|
最后的Gradio 交互 DEMO是不是还没有做? |
Author
|
老师您好我已添加 |
|
python版本改成3.9,然后安装依赖改成这几个 |
|
MindSpore 2.7.0 和 MindNLP 0.5.1 版本可以运行 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

任务描述:DeepSeek-OCR 文本识别与结构化解析可交互 DEMO
一、任务概述
基于 MindSpore 2.7.0 + MindNLP 0.5.1,参考 https://huggingface.co/spaces/khang119966/DeepSeek-OCR-DEMO,在华为 Ascend NPU 910B 上实现 DeepSeek-OCR 多场景文本识别与结构化解析的可交互 DEMO,支持流式生成与性能优化。
二、实现内容
多场景 OCR 可交互 DEMO(app.py)
基于 Gradio 构建 Web 交互界面,支持以下功能:
多任务类型:Free OCR(自由识别)、Markdown 转换(文档结构化)、图表解析、文本定位(Grounding)
多分辨率模式:Tiny(512)、Small(640)、Base(1024)、Large(1280)、Gundam(1024+640 crop,推荐)
图片上传与实时识别:用户上传图片后一键识别,结果区展示识别文本、标注图像和性能指标
NPU 适配:针对 Ascend NPU 不支持 scatter_add 算子的问题,使用 F.one_hot + 矩阵乘法替代方案
流式生成与 Token 时间统计
将模型输出改为流式生成模式,核心实现:
从 model.infer() 方法中抽取图像预处理逻辑为独立的 prepare_inputs() 函数
使用 TextIteratorStreamer 替代原生非流式输出
在后台线程中运行 model.generate(),主线程通过 streamer 迭代获取 token
实时统计性能指标:首 Token 延迟(TTFT)、已生成 Token 数、总耗时、生成速度(tokens/s)、解码速度(不含首 token)
模型推理性能优化
实施优化方案,并提供优化前后实测数据对比。