PDF Parser

This is a simple command-line tool written in Go to parse text from PDF files and output it to standard output, CSV, JSON, or Parquet format.

This tool relies on the pdftotext command-line utility, which is part of the poppler-utils package.

Installation

Install Go: Make sure you have Go installed on your system. You can download it from https://golang.org/.
Install poppler-utils: You need to install the poppler-utils package, which provides the pdftotext utility.
- On Debian/Ubuntu:
```
sudo apt-get update
sudo apt-get install poppler-utils
```
- On CentOS/RHEL:
```
sudo yum install poppler-utils
```
- On macOS (using Homebrew):
```
brew install poppler
```

Build the pdf-parser:

git clone <repository_url>
cd pdf-parser
go build -o pdf-parser main.go

Usage

To use the pdf-parser, run the following command:

./pdf-parser -input=<path_to_your_pdf_file> -output=<text|csv|json|parquet>

Command-line Flags

-input: (Required) The path to the input PDF file.
-output: (Optional) The output format. Can be text, csv, json, or parquet. Defaults to text.

Examples

Text Output (default):
```
./pdf-parser -input=my_document.pdf
```

CSV Output:

./pdf-parser -input=my_document.pdf -output=csv

JSON Output:

./pdf-parser -input=my_document.pdf -output=json

Parquet Output:

./pdf-parser -input=my_document.pdf -output=parquet

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
pdf-parser		pdf-parser

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Parser

Installation

Usage

Command-line Flags

Examples

About

Uh oh!

Releases

Packages

Languages

LenovoGuy98/pdf-parser

Folders and files

Latest commit

History

Repository files navigation

PDF Parser

Installation

Usage

Command-line Flags

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages