Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: table of contents generation without requiring to know in advance how many pages it will span #136

Closed
alallier opened this issue Apr 24, 2021 · 21 comments

Comments

@alallier
Copy link

alallier commented Apr 24, 2021

With this exception

fpdf.errors.FPDFException: The rendering function passed to FPDF.insert_toc_placeholder triggered to many page breaks: page 9 was reached while it was expected to span only 1 pages

Although I checked the code here and it doesn't seem it should fail

@alallier
Copy link
Author

alallier commented Apr 24, 2021

I was looking at the code and I see in the placeholder arguments you can specify how many pages it will span. This seems to work although you need to know your page count ahead of time which is impossible to do when running against dynamic datasets.

@Lucas-C
Copy link
Member

Lucas-C commented Apr 24, 2021

Yeah, this a current limitation of this implementation.

I think I see how I can make this fully dynamic now, I'm going to have a look at it.

@Lucas-C Lucas-C self-assigned this Apr 24, 2021
@Lucas-C
Copy link
Member

Lucas-C commented Apr 24, 2021

Without knowing in advance how many page the ToC will span, we will have to shift page numbers according to its size once rendered.

The challenge will then be to increment all references to page numbers:

  • in self.pages : dict containing pages and metadata
  • in self.annots : link & text annotations
  • in self.links : internal links inside the document
  • in self.struct_builder.doc_struct_elem.k (all page.id)
  • in self._outline

I fear this will introduce quite some code, and lower code readability...

As a workaround for now, you can always generate your documents with increasing values of pages passed to .insert_toc_placeholder, and stop when you do not have raised any exception.
It's an ugly approach, but it should work I think.

@Lucas-C Lucas-C changed the title Table of contents generation fails on long table of contents (greater than a page) Feature request: table of contents generation without requiring to know in advance how many page it will span Apr 24, 2021
@Lucas-C Lucas-C changed the title Feature request: table of contents generation without requiring to know in advance how many page it will span Feature request: table of contents generation without requiring to know in advance how many pages it will span Apr 24, 2021
@Lucas-C Lucas-C removed their assignment Apr 30, 2021
@jwinkel13
Copy link

A workaround would be to estimate the # of pages if the estimate is off one can use the # of pages the error returns to set the correct number. It allows for a max of 2 iteration

@yaminle
Copy link

yaminle commented May 11, 2023

If there is still any interest in this topic: Did an implementation of that. A little bit hacky though:

  1. create the ToC at the end of the pdf
    -- use placeholders for the current page-numbers (like number of pages -> {nb})
  2. reorder the pages and put the ToC in place
  3. fix the links
  4. replace the placeholders
    -- with the same limitations as the replacement of {nb}

I think that this could well be included in the current implementation

@Lucas-C
Copy link
Member

Lucas-C commented May 11, 2023

If there is still any interest in this topic: Did an implementation of that. A little bit hacky though:

I think that this could well be included in the current implementation

Hi @yaminle!
Thanks for the feedback 😊

Would you like to contribute a PR regarding this?

Else, if that is more investment than you wish, could you maybe share the code you used?
On GitHub or elsewhere

@yaminle
Copy link

yaminle commented May 11, 2023

Hi @yaminle! Thanks for the feedback 😊

Would you like to contribute a PR regarding this?

Else, if that is more investment than you wish, could you maybe share the code you used? On GitHub or elsewhere

Well - I would really like to. I'll do my best. I've got to study the guidelines first though ;)

@Lucas-C
Copy link
Member

Lucas-C commented May 11, 2023

Well - I would really like to. I'll do my best. I've got to study the guidelines first though ;)

Great! Take your time and please ask any you question you may have 😊

@Benoite142
Copy link

Hi, has this been implemented yet? I fond a "fix" for now, but it is really not perfect 😅.

@alallier
Copy link
Author

@Benoite142 what's your fix?

@Benoite142
Copy link

@Benoite142 what's your fix?

Hi! Pretty much what I do is I counted (by hand) how much lines there was in the first page of the TOC and counted once again for the following pages (46 for the first page and 48 for the followings) .
-- I then check if the size of my list is smaller than 46
-- If not, I do a ceil of the size minus the number of lines of first page divided by the number of lines of the other pages (48), and all of that +1 (for the first page).

I then pass the value I get in insert_toc_placeholder.

So pretty much it is:

if sizeOf(data) <=nbOfLinesPage1
   pages =1
else
   pages = ceil(( sizeOf(data) - nbOfLinesPage1) / nbOfLinesElse) +1

So it is really not perfect since I can change some visual stuff and need to recount of much lines I get on each pages for the calculation to work again.

@yaminle
Copy link

yaminle commented Sep 20, 2024 via email

@Benoite142
Copy link

A workaround would be to estimate the # of pages if the estimate is off one can use the # of pages the error returns to set the correct number. It allows for a max of 2 iteration

Hi, how can you do that without the 'A placeholder for the table of contents has already been defined' message appearing? I was able to get the number of pages from the first error message, but I get this other error message because I have already used insert_toc_placeholer call.

@andersonhc
Copy link
Collaborator

Hey @Benoite142 @alallier

I am working on the PR #1188 and I just created a reference implementation of Table of Contents that can handle adding extra pages, would you be available to take a look on it and check if it works for you and if you have any suggestion for improvement before I move to merge this PR?

To install the version from my branch you can do:

pip uninstall fpdf2
pip install git+https://github.com/andersonhc/fpdf2.git@page-number

The documentation is here:
https://github.com/andersonhc/fpdf2/blob/page-number/docs/DocumentOutlineAndTableOfContents.md#reference-implementation

You can also check the test I created here:
https://github.com/andersonhc/fpdf2/blob/891f0c2cdbe32c2b347b097073b621e4cd51fc17/test/outline/test_outline.py#L427
https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_0.pdf
https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_1.pdf
https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_2.pdf

@Benoite142
Copy link

Hey @andersonhc,

Wow! This looks promising!

I'll try to find time today to look at it and test the changes for the toc placeholder pages.

Thanks for reaching out!

@Benoite142
Copy link

Benoite142 commented Nov 14, 2024

Hey @andersonhc ,

Sorry for taking so long to try it out,

But I just tried it and get very good results with it! I've used it on my big pdf generator and a small test aside and I didn't get any issue with it. Good job!

Only thing I am seeing is that, the page number that I have in my footer doesn't always correctly display the right pages for some reason.
On my 29 page pdf, I get the right page for page 1 and then I get 27, 28, 29 for the rest of the TOC pages, and then 2,3, ... for the content that is not in the TOC.
I also show the corresponding pages for the content of the TOC with a link and the page is also wrong since the other pages from the TOC (2,3,4 but noted as 27,28,29) are in the wrong position.

But overall, great fix, I never encountered the fpdf.errors.FPDFException: The rendering function passed to FPDF.insert_toc_placeholder triggered to many page breaks: page 9 was reached while it was expected to span only 1 pages exception and that was my big problem.

Just the page number issue which can be tricky, but good job none the less. 😁

@Lucas-C
Copy link
Member

Lucas-C commented Jan 8, 2025

The reference implementation of Table of Contents made by @andersonhc has been released in fpdf2 2.8.2:
https://py-pdf.github.io/fpdf2/DocumentOutlineAndTableOfContents.html#reference-implementation

Only thing I am seeing is that, the page number that I have in my footer doesn't always correctly display the right pages for some reason.
On my 29 page pdf, I get the right page for page 1 and then I get 27, 28, 29 for the rest of the TOC pages, and then 2,3, ... for the content that is not in the TOC.
I also show the corresponding pages for the content of the TOC with a link and the page is also wrong since the other pages from the TOC (2,3,4 but noted as 27,28,29) are in the wrong position.

If you want @Benoite142 you could give us a minimal reproducible example of this annoying case, and we would be happy to get a look at it! 🙂

I think that we could close this issue?

@Benoite142
Copy link

Sure, I'll try doing that tomorrow. The issue might be fixed since I didn't try the code since I last commented.

And yes, you can close the issue. 😃

@alallier
Copy link
Author

alallier commented Jan 8, 2025

@Lucas-C thank you for all of your interaction in this thread over the past four years. I read the linked documentation you sent and it does in fact seem like you have solved what I originally opened this issue for. Unfortunately I no longer have my test case handy to test but based on the description of the changes it seems like it would work.

I might recommend linking the PR where the fix was implemented so future on lookers will have the full context before closing though. I agree with @Benoite142 I think it's safe to close the issue.

What @Benoite142 was discussing about footer numbers even seems to be covered by the note in the linked documentation, regardless as he stated that's probably a new issue anyways.

Thanks to everyone who contributed over the past few years on this!

@mschoettle
Copy link

What @Benoite142 is referring to is that as soon as the ToC spans more than 1 page the page numbers for the actual content are incorrect. The content always starts at 2.

Based on the documentation, it seems that page labels can help with that.

@mschoettle
Copy link

I created #1343 for the page number issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants