You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On certain large purldb instances, when using the api/resources/filter_by_checksums endpoint via the scancode.io map_deploy_to_develop pipeline, the match_to_purldb_resource step is very slow and can take +30 hours to complete.
After debugging, we found that the two biggest reasons for the slowness are:
Ordering of Resources, a lot of CPU time is spent ordering resources from a query
Decoding large JSON fields, a lot of time is spent parsing JSON fields if they are too big, like the history field on Package
Immediate solutions that come to mind:
Remove ordering for Resources
Create proper History model for Package, expedient thing would be to empty history json field. Look into using .only() on queries.
The text was updated successfully, but these errors were encountered:
On certain large purldb instances, when using the
api/resources/filter_by_checksums
endpoint via the scancode.iomap_deploy_to_develop
pipeline, thematch_to_purldb_resource
step is very slow and can take +30 hours to complete.After debugging, we found that the two biggest reasons for the slowness are:
Immediate solutions that come to mind:
.only()
on queries.The text was updated successfully, but these errors were encountered: