Open
Description
On certain large purldb instances, when using the api/resources/filter_by_checksums
endpoint via the scancode.io map_deploy_to_develop
pipeline, the match_to_purldb_resource
step is very slow and can take +30 hours to complete.
After debugging, we found that the two biggest reasons for the slowness are:
- Ordering of Resources, a lot of CPU time is spent ordering resources from a query
- Decoding large JSON fields, a lot of time is spent parsing JSON fields if they are too big, like the history field on Package
Immediate solutions that come to mind:
- Remove ordering for Resources
- Create proper History model for Package, expedient thing would be to empty history json field. Look into using
.only()
on queries.
Metadata
Metadata
Assignees
Labels
No labels