Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/pdfinfo parser #626

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

Luigi31415
Copy link
Contributor

@Luigi31415 Luigi31415 commented Dec 29, 2024

This PR adds a new parser for handling the output of the pdfinfo command.
closes #624

@kellyjonbrazil
Copy link
Owner

Thanks for the parser contribution! Could you fork this from the dev branch? Also, I notice other parser files in this PR. Could you ensure the PR only includes the parser.py file and tests/fixtures?

Thanks!

@Luigi31415 Luigi31415 changed the base branch from master to dev January 26, 2025 06:01
@Luigi31415
Copy link
Contributor Author

Hey Kelly,
I just rebased that branch, sorry for the late reply. There are no other files in this PR, only universal which I only included because of the universal output of pdfinfo, I thought it would be logical to extend a parser.

Title:          Brochure
Producer:       Skia/PDF m111 Google Docs Renderer
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page size:      612 x 792 pts (letter) (rotated 0 degrees)
File size:      69988 bytes
Optimized:      no
JavaScript:     no
PDF version:    1.4

Thanks for maintaining a great library man, appreciate your work.

@kellyjonbrazil
Copy link
Owner

kellyjonbrazil commented Jan 26, 2025

Hi @Luigi31415 - thanks for the updates. I wonder if there is a simpler way to do this since it looks like pdfinfo output is really just key/value pairs and we already have a key/value parser (--kv). Is there a need for this parser?

I can see if the keys should be renamed, then maybe we just alias to the existing --kv parser (which itself is an alias of --ini) and then just run the lib.normalize_key function within _process.

Here is the jc output using the existing key/value parser:

% echo 'Title:          Brochure
Producer:       Skia/PDF m111 Google Docs Renderer
Tagged:         no
Form:           none
Pages:          2
Encrypted:      no
Page size:      612 x 792 pts (letter) (rotated 0 degrees)
File size:      69988 bytes
Optimized:      no
JavaScript:     no
PDF version:    1.4' | jc --kv -p
{
  "Title": "Brochure",
  "Producer": "Skia/PDF m111 Google Docs Renderer",
  "Tagged": "no",
  "Form": "none",
  "Pages": "2",
  "Encrypted": "no",
  "Page size": "612 x 792 pts (letter) (rotated 0 degrees)",
  "File size": "69988 bytes",
  "Optimized": "no",
  "JavaScript": "no",
  "PDF version": "1.4"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New parser request: pdfinfo
2 participants