Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backward replication of RECAP PDF uploads to subdockets #4864

Open
albertisfu opened this issue Dec 30, 2024 · 3 comments
Open

Backward replication of RECAP PDF uploads to subdockets #4864

albertisfu opened this issue Dec 30, 2024 · 3 comments

Comments

@albertisfu
Copy link
Contributor

This is a follow up of #4826

In #4857 we added support for replicating incoming PDF uploads to subdockets. The goal of this issue is to enable PDF replication in the opposite direction. Specifically, if a docket already contains multiple merged PDFs and we detect that it has subdockets, we should replicate the PDFs to all subdockets where the RECAPDocuments match. This ensures that each subdocket is as complete as possible.

To consider:

  • Replication at the RECAPDocument Level:

    PDFs should be copied at the RECAPDocument level, matched by pacer_doc_id in order to avoid merging content that does not belong to a sub-docket.

  • Triggering Backward Replication:

    What action should trigger this backward replication? Ideally, replication should occur as soon as we identify that a docket has subdockets. Currently, this can be detected during an attachment page upload or a PDF upload, where logic exists to find common RECAPDocuments in subdockets. Additional triggers may be required once we add this logic to other sources.

  • Handling content variations in subdockets:

    Some sub-dockets may have different PDFs than others. To ensure completeness, the replication logic should be applied to each subdocket to all other sub-dockets. By the end of the process, all subdockets will contain the all the same content.

@mlissner
Copy link
Member

I might not be thinking about this carefully enough, but I think the trigger is that whenever you add a new RECAPDocument without a PDF, you check if the PDF is available elsewhere and you replicate it to the new RECAPDocument if needed.

I think this happens when:

  • We get new docket entries from a docket upload, docket history upload, or recap.email email.
  • We get new docket entries from RSS
  • We get new docket entries from an attachment page.

We could have an extra step to only do this if the docket number contains cr, to save on performance, but otherwise, seems straightforward to me?

@mlissner mlissner moved this to Backlog Jan 13 - Jan 24 in Sprint (Web Team) Dec 31, 2024
@albertisfu
Copy link
Contributor Author

Got it! So the goal of this issue is simply to replicate existing PDFs to new RECAPDocuments that match, correct?

I initially thought we wanted to replicate existing PDFs to all RECAPDocuments that matched, even if the RDs were not new. However, if this should only apply to newly incoming RECAPDocuments, I agree that it’s straightforward.

@mlissner
Copy link
Member

mlissner commented Jan 1, 2025

Got it! So the goal of this issue is simply to replicate existing PDFs to new RECAPDocuments that match, correct?

Yes, exactly. We'll get to that bigger project eventually, but not now. :)

@mlissner mlissner moved this from Backlog Jan 13 - Jan 24 to PACER Data/Issues in Sprint (Web Team) Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: PACER Data/Issues
Development

No branches or pull requests

2 participants