-
Notifications
You must be signed in to change notification settings - Fork 40
SPDX Compare Time Out #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There have been some performance issues in the past when deleting SPDX elements, but that shouldn't impact the compare. Compare is very compute intensive, but it shouldn't be timing out. Since dependencies can have cycles, I wonder if there isn't an infinite loop that not being handled properly. The unit tests run through a number of compare scenarios, so this issue may be related to the size of the SPDX files or something unique about the SPDX files. If you can point me to two SPDX files that reproduce the problem, I'll take a look. |
Attached for the SPDX files we are trying to compare |
I did some analysis and debugging. There are 44,595 elements which are in the The document describes should only point to a few elements - the "root of the tree", typically less than 10 elements, so this is unusually large. To make sure these match, the algorithm iterates through each element and makes sure it finds an equivalent element in the comparison We may be able to improve the algorithm, but I would not expect any of the collections to be larger than 5,000 or so in practice. |
I did some more analysis and experimentation to see if I could dramatically improve the performance of comparing very large SPDX element collections. TL;DR - no success yet, but some interesting findings. Following are some of the constraints when doing the compares:
Comparing all the property values can take some time, especially if a property value is also a collection of elements. We can get a recursive stack of comparing very large number of elements. One thing I noticed in my experimentation is we tend to compare the same elements over and over (e.g. the collections being compared may be a property of another element which is being compared). If we relax the stateless access (3 above) for this use case, we can keep track of the comparisons in a map. I created prototype using this approach (reference this draft pull request). Unfortunately, we now have a memory issue - I ran out of heap space running the comparison above. I'm not quite sure yet what is consuming the memory. The maximum size of the map will be total number of elements to the factor of the number of documents. Since it is a Boolean map, I wouldn't expect it to be large enough to run us out of heap space. More analysis needs to be done. If anyone is interested in looking into this, let me know. |
If we can guarantee the collection (of differences) is sorted, will it help the comparison? The sorting order is a desired enhancement from this spdx/Spdx-Java-Library#232 too. So looks like a good investment. |
Hello Team
We are working on comparing SPDX files using Java-Tool by generating one from zephyr and other one from previous release
Looks like compare never finishes and time out
Any one facing similar issue or following any other methods which succeeded the compare ?
The text was updated successfully, but these errors were encountered: