Reduce LRU image cache's memory footprint by nothingface0 · Pull Request #46 · cms-DQM/dqmgui_prod

nothingface0 · 2025-12-09T15:23:13Z

Investigating #41, we found that the high memory usage is stemming from the lruimg cache kept by the server, which keeps the last 1024 rendered images in memory.

This cache stores, among other things, the databytes of the histogram used to render the image. Upon investigation, it looks like these are not strictly needed, nor reused, unlike the pngbytes, which store the actual rendered image. This PR modifies the logic to clear the databytes of the cached images completely, trying to reduce overall memory usage.

Warning

One other thing to note is that, the changes introduced by the PR also affect the way that the cache hit check is done (here): databytes are no longer considered during this check, as they will be empty. This should not, in general, affect the cache performance. One corner case scenario that will affect its correctness, however, will be the following:
1. A histogram is rendered and cached in the LRU cache.
2. The same histogram is updated in the DQMGUI's index, e.g., by uploading a newer file containing the same histogram. The cache still contains the previous version of the histogram at this point.
3. The user requests a refresh of the page containing the histogram. The cache checks whether the histogram is already rendered, and does not consider the updated databytes, which are due to the new version of the histogram, giving a false cache hit, and returns the cached, older version of the histogram.

Tests were done with:

An exceptionally huge ROOT histogram which, when unpacked, reaches ~960 MB in memory.
DQMNet's MESSAGE_SIZE_LIMIT increased to 1GB to accommodate the above change.
O0 optimization
valgrind --tool=massif --num-callers=500 --error-limit=no --trace-children=yes --suppressions=/data/srv/root/v6-28-10/etc/valgrind-root.supp --suppressions=/data/srv/root/v6-28-10/etc/valgrind-root-python.supp --suppressions=/usr/libexec/valgrind/default.supp --vgdb=yes

Further optimizations (TODO?):

Prevent loading the data from the index before rendering, unless the histogram actually needs to be rendered, avoiding unnecessary mem usage spikes, when the historgams are in cache.

Performance

Baseline memory usage

With the PR

To avoid copying data unnecessarily altogether

gabrielmscampos · 2025-12-11T15:19:00Z

Will this also fix the issue with the offline/relval DQMGUI flavour going down from time to time due to high memory usage?

nothingface0 · 2025-12-12T07:17:46Z

Will this also fix the issue with the offline/relval DQMGUI flavour going down from time to time due to high memory usage?

Probably not, the size of the cache, as it is in production, does not, in principle, exceed 8GB.

gabrielmscampos · 2025-12-12T08:12:16Z

So what is this fixing exactly, the HGCAL issue when loading multiple TH2Poly objects in one page? I'm a little bit worried about this:

One corner case scenario that will affect its correctness.... and returns the cached, older version of the histogram.

This is unlikely to be a big deal in offline/relval, but in Online the shifters are constantly looking for new plots. Could you elaborate more on this corner case?

Test removing databytes from cache

aced38c

nothingface0 mentioned this pull request Dec 9, 2025

Very high memory usage of monGui #41

Open

Try with custom copy method

36ecdc5

To avoid copying data unnecessarily altogether

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce LRU image cache's memory footprint#46

Reduce LRU image cache's memory footprint#46
nothingface0 wants to merge 2 commits intodev128from
refactor_image_lru_cache

nothingface0 commented Dec 9, 2025 •

edited

Loading

Uh oh!

gabrielmscampos commented Dec 11, 2025

Uh oh!

nothingface0 commented Dec 12, 2025

Uh oh!

gabrielmscampos commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nothingface0 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Further optimizations (TODO?):

Performance

Baseline memory usage

With the PR

Uh oh!

gabrielmscampos commented Dec 11, 2025

Uh oh!

nothingface0 commented Dec 12, 2025

Uh oh!

gabrielmscampos commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nothingface0 commented Dec 9, 2025 •

edited

Loading