Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thumbnail-generator : Internal Server Error: 500 #220

Closed
andrewjbtw opened this issue May 15, 2020 · 12 comments
Closed

thumbnail-generator : Internal Server Error: 500 #220

andrewjbtw opened this issue May 15, 2020 · 12 comments
Assignees
Labels

Comments

@andrewjbtw
Copy link
Collaborator

Describe the bug
A large number of WAS objects have this error:

thumbnail-generator : Internal Server Error: 500 (Response from dor-services-app did not contain a body. Check honeybadger for dor-services-app for backtraces, and look into adding a `rescue_from` in dor-services-app to provide more details to the client in the future)

User Impact
This error is affecting web archive items in the Stanford University Websites Collection, which we are hoping to be able to resume accessioning in the near future.

https://argo.stanford.edu/catalog?f%5Bwf_wps_ssim%5D%5B%5D=wasSeedPreassemblyWF%3Athumbnail-generator%3Aerror

@andrewjbtw andrewjbtw added the bug label May 15, 2020
@andrewjbtw
Copy link
Collaborator Author

May have some relation to #81

@jcoyne jcoyne assigned jcoyne and unassigned jcoyne May 18, 2020
@jcoyne
Copy link
Contributor

jcoyne commented May 19, 2020

This is now logging these errors to Honeybadger, so it's possible to see they link with https://app.honeybadger.io/projects/50568/faults/62286610 which is an invalid mapping between the existing object and the cocina model.

@jcoyne
Copy link
Contributor

jcoyne commented May 19, 2020

@andrewjbtw this raises an error because in dor-services-app, it can't find a title.

Cocina::TitleMapper.build(item)
=> nil

we have made an assumption that all objects have titles, but apparently web archive seeds do not have descriptive metadata.

@andrewjbtw
Copy link
Collaborator Author

Web archive seeds are supposed to have their URLs be initial titles at registration and then have the real titles filled in later. It looks like we need to batch add descMetadata to all of these. Tony Z. has a method for doing this since it used to come up a lot with objects registered a long time ago.

Fully accessioned web archive seeds do have titles: https://argo.stanford.edu/view/druid:bb196dd3409 I don't know what process got them there, if they started without descMetadata datastreams.

@andrewjbtw
Copy link
Collaborator Author

I've added descriptive metadata and a title to https://argo.stanford.edu/view/druid:bc770gm9177 and will see if that leads to a thumbnail being generated. So far it's just sitting at queued.

@jcoyne
Copy link
Contributor

jcoyne commented Jun 3, 2020

@andrewjbtw it looks like the server for was_robots_prod got upgraded, but the code never got deployed.

@jcoyne
Copy link
Contributor

jcoyne commented Jun 3, 2020

I deployed was_robot_suite and that ran through and hit this error: https://app.honeybadger.io/projects/51141/faults/64372052

@andrewjbtw
Copy link
Collaborator Author

I might have figured this out. I haven't fixed bc770gm9177 but I got another one to work. Still investigating bc770gm9177 but it's possible it's now in a bad state because of my previous attempt at fixing it.

@andrewjbtw
Copy link
Collaborator Author

Here's what I've been able to determine with more trial and error.

  1. The descMetadata datastream needs to be created for items that don't have it. I think it wasn't required at this point in the workflow many years ago, but it is now. This is a one time task that has already been ticket for the SDR Operations team.
  2. The step before this step - desc-metadata-generator - may actually need to be re-run. It seems to be what creates a descMetadata XML that becomes the MODS when the seed is accessioned.
  3. After 1 and 2 are fixed, some items still end up failing thumbnail generation because SWAP isn't rendering a URL that can be made into a screenshot.
  4. I still don't understand exactly what's wrong with bc770gm9177. I successfully accessioned https://argo.stanford.edu/view/druid:bd508cy5924 today and it looks similar.

@jcoyne
Copy link
Contributor

jcoyne commented Jun 4, 2020

Regarding #4 it is hitting #234. PhantomJS is exiting in a way where it's not successfully producing a thumbnail and also not giving an error. It's possibly due to Flash on the page in question or a javascript bug. Our current thought there is to replace PhantomJS with headless chrome.

@ndushay
Copy link
Contributor

ndushay commented Jun 15, 2020

waiting on Andrew being able to register a seed in stage

@andrewjbtw
Copy link
Collaborator Author

Working on fixing these by providing descriptive metadata datastreams so they conform to Cocina. Going to close because I think the only remaining work is data and workflow remediation and these items are accounted for in an SDRO ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants