Reduce LRU image cache's memory footprint#46
Conversation
To avoid copying data unnecessarily altogether
|
Will this also fix the issue with the offline/relval DQMGUI flavour going down from time to time due to high memory usage? |
Probably not, the size of the cache, as it is in production, does not, in principle, exceed 8GB. |
|
So what is this fixing exactly, the HGCAL issue when loading multiple TH2Poly objects in one page? I'm a little bit worried about this:
This is unlikely to be a big deal in offline/relval, but in Online the shifters are constantly looking for new plots. Could you elaborate more on this corner case? |
Investigating #41, we found that the high memory usage is stemming from the
lruimgcache kept by the server, which keeps the last 1024 rendered images in memory.This cache stores, among other things, the
databytesof the histogram used to render the image. Upon investigation, it looks like these are not strictly needed, nor reused, unlike thepngbytes, which store the actual rendered image. This PR modifies the logic to clear thedatabytesof the cached images completely, trying to reduce overall memory usage.Warning
One other thing to note is that, the changes introduced by the PR also affect the way that the cache hit check is done (here):
databytesare no longer considered during this check, as they will be empty. This should not, in general, affect the cache performance. One corner case scenario that will affect its correctness, however, will be the following:1. A histogram is rendered and cached in the LRU cache.
2. The same histogram is updated in the DQMGUI's index, e.g., by uploading a newer file containing the same histogram. The cache still contains the previous version of the histogram at this point.
3. The user requests a refresh of the page containing the histogram. The cache checks whether the histogram is already rendered, and does not consider the updated
databytes, which are due to the new version of the histogram, giving a false cache hit, and returns the cached, older version of the histogram.Tests were done with:
DQMNet'sMESSAGE_SIZE_LIMITincreased to 1GB to accommodate the above change.O0optimizationvalgrind --tool=massif --num-callers=500 --error-limit=no --trace-children=yes --suppressions=/data/srv/root/v6-28-10/etc/valgrind-root.supp --suppressions=/data/srv/root/v6-28-10/etc/valgrind-root-python.supp --suppressions=/usr/libexec/valgrind/default.supp --vgdb=yesFurther optimizations (TODO?):
Performance
Baseline memory usage
With the PR