Skip to content

wengzhiwen/convert2img

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

convert2img

是一个被重新发明的轮子,用于将PDF文档转换为图像。我为了将PDF拆成图片用来微调模型而写了这个脚本。

他利用pdf2image库,简单来说就是给库套了一个可以直接在CLI中运行的壳。

特别是给pdf2image套了一个简单的多线程处理,使得处理多页PDF文件时的性能有所提升。

如果你手头有一个PDF文件,想要将PDF文件完美转换成markdown的话,你可以使用我重新发明的另一个轮子: img2md,直接将这里生成的图片序列转换成markdonwn。

使用说明

  1. 安装依赖:

    pip install -r requirements.txt
  2. 运行转换命令:

    python convert2img.py <pdf_file> [--dpi=DPI]

生成的图片文件(每页一个PNG)会被保存到一个自动生成的文件夹中,偷懒偷到烂

English

A tool for converting PDF documents to images. I wrote this script to split PDFs into images for fine-tuning models.

Using pdf2image library, it provides a CLI wrapper for the pdf2image library.

Simple multi-thread, improving performance for large PDF files.

If you have a PDF-file that you want to convert perfectly into Markdown, seee another tool of mine:
img2md, that turns images which can be created here to markdown file.

Quick Start

  1. Install dependencies:

    pip install -r requirements.txt
  2. Run the conversion command:

    python convert2img.py <pdf_file> [--dpi=DPI]

Image files (one PNG per page) will be saved in an automatically generated folder.

About

Convert pdf to images

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages