Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a missing.jp2 thumbnail if thumbnail not able to be generated from replay #81

Open
ndushay opened this issue May 25, 2018 · 9 comments
Labels
hold for PO review PO needs to sign off before merge web archiving for June-July 2022 work cycle

Comments

@ndushay
Copy link
Contributor

ndushay commented May 25, 2018

@nullhandle says: "the thumbnail generator that is part of the seed registration process can't complete successfully because it doesn't find a resource with that url indexed into swap

i believe it's because of an indexing issue for the accessioned crawl content in openwayback"

He asks the question here: https://groups.google.com/forum/#!topic/openwayback-dev/lN7fdSL68-c

But sort key is set correctly:

and

@edsu
Copy link
Contributor

edsu commented May 17, 2022

In workcycle planning we discussed this problem and decided that we would try catch these by inspecting the contents of the page, and write them as a missing.jp2 which will:

  1. allow the workflow to proceed
  2. allow these to be noticed by a scheduled report

@edsu edsu changed the title crawls in SWAP are not being found by url sent to thumbnail generator Detect not found errors during thumbnail generation May 17, 2022
@andrewjbtw andrewjbtw added the hold for PO review PO needs to sign off before merge label May 20, 2022
@lwrubel lwrubel added the web archiving for June-July 2022 work cycle label May 23, 2022
@mjgiarlo
Copy link
Member

Step 2 in what @edsu writes above is separately ticketed in #431

@ndushay
Copy link
Contributor Author

ndushay commented May 26, 2022

pywb will be doing thumbnails in a different way so maybe this isn't usefully done at this time.

@peterchanws
Copy link

Here is the ticket showing the thumbnail issue:
#446

@edsu
Copy link
Contributor

edsu commented Jun 3, 2022

Since it appears pywb page view returns a 200 OK items that are not found it ought to be possible to use the "raw" URL or the Memento API to determine if something is there or not?

@lwrubel
Copy link
Contributor

lwrubel commented Jun 17, 2022

We've created an issue to create a step to check for the site in the web archive here: #475

@lwrubel
Copy link
Contributor

lwrubel commented Jun 21, 2022

Before proceeding with this, implement #475 and #446 and see how much need remains for a placeholder image. Too many redirects examples may be needed as a remaining use case.

As @andrewjbtw mentioned, "A placeholder to get past the step could be an improvement here because the failure of the workflow step itself is a pain point, in some ways more of a pain point than the steps to manually supply the thumbnail."

@ndushay
Copy link
Contributor Author

ndushay commented Jul 14, 2022

Peter is fine with a temporary thumbnail ... would like to be notified if this happens. See #431 re missing thumbnails reported.

@ndushay
Copy link
Contributor Author

ndushay commented Jul 14, 2022

Decision for now: monitor seed robot workflow failures and see how often this happens, first.

@lwrubel lwrubel changed the title Detect not found errors during thumbnail generation Create a missing.jp2 thumbnail if thumbnail not able to be generated from replay Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold for PO review PO needs to sign off before merge web archiving for June-July 2022 work cycle
Projects
None yet
Development

No branches or pull requests

6 participants