Skip to content

Commit aa905db

Browse files
committed
fix: change README.md to Chinese version
1 parent 948905e commit aa905db

File tree

1 file changed

+0
-343
lines changed

1 file changed

+0
-343
lines changed

README.md

Lines changed: 0 additions & 343 deletions
Original file line numberDiff line numberDiff line change
@@ -1,343 +0,0 @@
1-
# 📄 ParseFlow
2-
3-
**Universal document parsing library for PDF, Word, and Excel files**
4-
5-
[![npm version](https://img.shields.io/npm/v/parseflow-core.svg)](https://www.npmjs.com/package/parseflow-core)
6-
[![MCP Server](https://img.shields.io/badge/MCP-Server-blue)](https://www.npmjs.com/package/parseflow-mcp-server)
7-
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
8-
9-
ParseFlow is a comprehensive document parsing solution that supports **PDF**, **Word (docx)**, and **Excel (xlsx/xls)** files. It provides both a standalone library and an MCP (Model Context Protocol) server for AI assistants.
10-
11-
[English](./README_EN.md) | [Examples](./OFFICE_EXAMPLES.md) | [GitHub](https://github.com/Libres-coder/ParseFlow)
12-
13-
---
14-
15-
## ✨ Features
16-
17-
### 📄 PDF Support
18-
- ✅ Text extraction with multiple strategies (raw, formatted, clean)
19-
- ✅ Page-specific and range-based extraction
20-
- ✅ Metadata retrieval (title, author, dates, page count)
21-
- ✅ Full-text search with context
22-
- ✅ Image extraction (placeholder)
23-
- ✅ Table of contents (TOC) extraction (placeholder)
24-
25-
### 📝 Word (docx) Support
26-
- ✅ Text extraction
27-
- ✅ HTML conversion
28-
- ✅ Metadata retrieval
29-
- ✅ Text search with context
30-
31-
### 📊 Excel (xlsx/xls) Support
32-
- ✅ Multi-sheet data extraction
33-
- ✅ Multiple output formats (JSON, CSV, Text)
34-
- ✅ Sheet-specific extraction
35-
- ✅ Cell-based search
36-
- ✅ Range extraction
37-
- ✅ Workbook metadata
38-
39-
### 🤖 MCP Server
40-
- ✅ 9 tools for AI assistants (5 PDF + 2 Word + 2 Excel)
41-
- ✅ Works with Claude Desktop and other MCP clients
42-
- ✅ Path security with allowlist support
43-
44-
---
45-
46-
## 📦 Installation
47-
48-
### Core Library
49-
50-
```bash
51-
npm install parseflow-core
52-
```
53-
54-
### MCP Server (Global)
55-
56-
```bash
57-
npm install -g parseflow-mcp-server
58-
```
59-
60-
Or use with npx:
61-
62-
```bash
63-
npx parseflow-mcp-server
64-
```
65-
66-
---
67-
68-
## 🚀 Quick Start
69-
70-
### PDF Parsing
71-
72-
```typescript
73-
import { PDFParser } from 'parseflow-core';
74-
75-
const parser = new PDFParser();
76-
77-
// Extract all text
78-
const text = await parser.extractText('document.pdf');
79-
80-
// Extract specific page
81-
const page5 = await parser.extractPage('document.pdf', 5);
82-
83-
// Search
84-
const results = await parser.search('document.pdf', 'keyword');
85-
86-
// Get metadata
87-
const metadata = await parser.getMetadata('document.pdf');
88-
```
89-
90-
### Word Parsing
91-
92-
```typescript
93-
import { WordParser } from 'parseflow-core';
94-
95-
const parser = new WordParser();
96-
97-
// Extract text
98-
const result = await parser.extractText('report.docx');
99-
console.log(result.text);
100-
101-
// Convert to HTML
102-
const html = await parser.extractHTML('report.docx');
103-
104-
// Search
105-
const matches = await parser.searchText('report.docx', 'budget');
106-
```
107-
108-
### Excel Parsing
109-
110-
```typescript
111-
import { ExcelParser } from 'parseflow-core';
112-
113-
const parser = new ExcelParser();
114-
115-
// Extract all sheets (JSON format)
116-
const data = await parser.extractData('spreadsheet.xlsx');
117-
118-
// Extract specific sheet
119-
const sales = await parser.extractData('data.xlsx', {
120-
sheetName: 'Q4 Sales',
121-
format: 'json'
122-
});
123-
124-
// Search in cells
125-
const results = await parser.searchText('data.xlsx', 'revenue');
126-
```
127-
128-
---
129-
130-
## 🛠️ MCP Server Usage
131-
132-
### Configuration for Claude Desktop
133-
134-
Add to `claude_desktop_config.json`:
135-
136-
```json
137-
{
138-
"mcpServers": {
139-
"parseflow": {
140-
"command": "npx",
141-
"args": ["-y", "parseflow-mcp-server"],
142-
"env": {
143-
"PARSEFLOW_ALLOWED_PATHS": "C:\\Documents;D:\\Projects"
144-
}
145-
}
146-
}
147-
}
148-
```
149-
150-
### Available Tools
151-
152-
#### PDF Tools
153-
- `extract_text` - Extract text from PDF files
154-
- `search_pdf` - Search for keywords in PDF
155-
- `get_metadata` - Get PDF metadata
156-
- `extract_images` - Extract images from PDF
157-
- `get_toc` - Get table of contents
158-
159-
#### Word Tools
160-
- `extract_word` - Extract text/HTML from Word documents
161-
- `search_word` - Search in Word documents
162-
163-
#### Excel Tools
164-
- `extract_excel` - Extract data from Excel spreadsheets
165-
- `search_excel` - Search in Excel cells
166-
167-
### Example Usage in Claude
168-
169-
```
170-
"请读取 report.docx 文件的内容"
171-
→ Uses extract_word tool
172-
173-
"在 sales.xlsx 中查找 '产品A'"
174-
→ Uses search_excel tool
175-
176-
"提取 document.pdf 的元数据"
177-
→ Uses get_metadata tool
178-
```
179-
180-
---
181-
182-
## 📚 Documentation
183-
184-
- **[Office Examples](./OFFICE_EXAMPLES.md)** - Word and Excel usage examples
185-
- **[Release Guide](./RELEASE_GUIDE.md)** - How to publish new versions
186-
- **[Contributing](./CONTRIBUTING.md)** - Contribution guidelines
187-
- **[Security Policy](./SECURITY.md)** - Security vulnerability reporting
188-
- **[Code of Conduct](./CODE_OF_CONDUCT.md)** - Community guidelines
189-
190-
---
191-
192-
## 🏗️ Project Structure
193-
194-
```
195-
ParseFlow/
196-
├── packages/
197-
│ ├── pdf-parser-core/ # Core library (parseflow-core)
198-
│ │ ├── src/
199-
│ │ │ ├── parser.ts # PDF parser
200-
│ │ │ ├── WordParser.ts # Word parser
201-
│ │ │ └── ExcelParser.ts # Excel parser
202-
│ │ └── package.json
203-
│ └── mcp-server/ # MCP server (parseflow-mcp-server)
204-
│ ├── src/
205-
│ │ ├── index.ts # Server entry
206-
│ │ └── tools/ # MCP tools
207-
│ └── package.json
208-
├── docs/ # Documentation
209-
├── examples/ # Usage examples
210-
├── tests/ # Test files
211-
└── scripts/ # Build scripts
212-
```
213-
214-
---
215-
216-
## 🧪 Testing
217-
218-
```bash
219-
# Run all tests
220-
pnpm test
221-
222-
# Test coverage
223-
pnpm test:coverage
224-
225-
# Run specific test
226-
pnpm test parser.test.ts
227-
```
228-
229-
### Test Files
230-
- **Word测试文件.docx** - Word test document
231-
- **Excel测试文件.xlsx** - Excel test workbook (3 sheets)
232-
- **PDF测试文档.pdf** - PDF test document
233-
234-
---
235-
236-
## 🔧 Development
237-
238-
```bash
239-
# Install dependencies
240-
pnpm install
241-
242-
# Build all packages
243-
pnpm build
244-
245-
# Watch mode
246-
pnpm dev
247-
248-
# Lint
249-
pnpm lint
250-
251-
# Type check
252-
pnpm type-check
253-
```
254-
255-
---
256-
257-
## 📈 Roadmap
258-
259-
### v1.1.0 (Current)
260-
- ✅ Word (docx) support
261-
- ✅ Excel (xlsx/xls) support
262-
- ✅ 9 MCP tools
263-
264-
### v1.2.0 (Planned)
265-
- [ ] Encrypted PDF support
266-
- [ ] OCR text recognition
267-
- [ ] PowerPoint (pptx) support
268-
- [ ] Batch processing optimization
269-
270-
### v2.0.0 (Future)
271-
- [ ] Plugin system
272-
- [ ] More document formats (CSV, TXT, RTF)
273-
- [ ] Advanced table extraction
274-
- [ ] Document conversion
275-
276-
---
277-
278-
## 🤝 Contributing
279-
280-
We welcome contributions! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for details.
281-
282-
### Ways to Contribute
283-
- 🐛 Report bugs
284-
- 💡 Suggest features
285-
- 📝 Improve documentation
286-
- 🔧 Submit pull requests
287-
288-
---
289-
290-
## 📦 Packages
291-
292-
| Package | Version | Description |
293-
|---------|---------|-------------|
294-
| [parseflow-core](https://www.npmjs.com/package/parseflow-core) | 1.0.1 | Core parsing library |
295-
| [parseflow-mcp-server](https://www.npmjs.com/package/parseflow-mcp-server) | 1.0.2 | MCP server for AI |
296-
297-
---
298-
299-
## 🔗 Links
300-
301-
- **npm Core**: https://www.npmjs.com/package/parseflow-core
302-
- **npm MCP**: https://www.npmjs.com/package/parseflow-mcp-server
303-
- **GitHub**: https://github.com/Libres-coder/ParseFlow
304-
- **Issues**: https://github.com/Libres-coder/ParseFlow/issues
305-
- **MCP Registry**: https://registry.modelcontextprotocol.io/
306-
307-
---
308-
309-
## 📄 License
310-
311-
MIT License - see [LICENSE](./LICENSE) file for details.
312-
313-
---
314-
315-
## 🙏 Acknowledgments
316-
317-
- **pdf-parse** - PDF parsing
318-
- **pdf-lib** - PDF manipulation
319-
- **mammoth** - Word document parsing
320-
- **xlsx** - Excel spreadsheet parsing
321-
- **MCP SDK** - Model Context Protocol
322-
323-
---
324-
325-
## 📊 Stats
326-
327-
- **Test Coverage**: 83%+
328-
- **Supported Formats**: 3 (PDF, Word, Excel)
329-
- **MCP Tools**: 9
330-
- **Dependencies**: Minimal and well-maintained
331-
332-
---
333-
334-
## 💬 Community
335-
336-
- **Issues**: [GitHub Issues](https://github.com/Libres-coder/ParseFlow/issues)
337-
- **Discussions**: [GitHub Discussions](https://github.com/Libres-coder/ParseFlow/discussions)
338-
339-
---
340-
341-
**Made with ❤️ by Libres-coder**
342-
343-
**Status**: 🎉 Production Ready (v1.1.0)

0 commit comments

Comments
 (0)