Updated Occupancies #147

GNiendorf · 2025-01-24T21:18:43Z

Updated occupancies for all LST objects, and small updates to the notebook for printing them.

The upper bound matrices don't have a huge effect on the total memory usage anymore with my latest PR merged in.

0p8 pT Threshold, 1 streams, 500 events:

New - 1615MiB
Current (with dynamic occ PR) - 1423MiB

0p8 pT Threshold, 8 streams, 500 events:

New - 6879MiB
Current (with dynamic occ PR) - 5743MiB

A 13.5% increase for 1 stream and a 20% total memory increase for 8 streams. I think that with the increase in pT5 efficiency and the reduced truncations frequency it is worth it. I saw a reduction in my dynamic occ PR for single stream of ~25%, so we would still have a net decrease in total memory from where we started. Right now we are trying to reduce the number of fakes stored in memory by applying artificially lower occupancies in certain regions and for certain objects. Maybe the cuts for those objects should be reevaluated and tightened instead? I think the T3 DNN will likely reduce the triplet occupancies as well.

Truncations (0p8 pT Threshold, 10 events):

New -

MDs - 107
Segments - 117
Triplets - 3,556
Quints - 0

Current -

MDs - 313
Segments - 16,488
Triplets - 8,400
Quints - 0

GNiendorf · 2025-01-24T21:18:50Z

/run all

github-actions · 2025-01-24T21:24:40Z

There was a problem while building and running with CMSSW. The logs can be found here.

GNiendorf · 2025-01-24T21:27:11Z

@ariostas Seems like an issue with the CMSSW tests? Something about alpaka math?

github-actions · 2025-01-24T21:34:15Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.0    401.4    188.0    150.9    146.0    551.4    122.8    233.0    151.3      3.7    1994.5    1397.1+/- 387.4     528.0   explicit[s=4] (target branch)
   avg     43.0    393.4    188.3    161.6    149.4    549.3    123.6    231.1    150.6      3.2    1993.6    1401.3+/- 390.3     529.1   explicit[s=4] (this PR)

GNiendorf · 2025-01-24T21:35:36Z

/run standalone lowpt

ariostas · 2025-01-24T21:35:59Z

Oh that's a package that was recently introduced by Manos. You could cherry-pick that commit or see if by early next week the CMSSW PR finally gets merged so that you can rebase

github-actions · 2025-01-24T22:04:17Z

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     51.0    407.0    591.2    761.4   1052.4   1228.5    294.0   1152.9    360.3      5.0    5903.7    4624.2+/- 1634.9    1533.7   explicit[s=4] (target branch)
   avg     51.7    403.2    586.2    788.8   1215.9   1222.9    293.5   1161.9    367.7      6.4    6098.2    4823.6+/- 1758.5    1590.7   explicit[s=4] (this PR)

RecoTracker/LSTCore/src/alpaka/Quintuplet.h

RecoTracker/LSTCore/standalone/analysis/occupancy/compute_occupancies.ipynb

RecoTracker/LSTCore/src/alpaka/Segment.h

slava77 · 2025-01-24T22:57:26Z

RecoTracker/LSTCore/src/alpaka/Triplet.h

-          {0, 38, 46, 39}      // category 3
+          {668, 271, 105, 59},  // category 0
+          {738, 310, 0, 0},     // category 1
+          {0, 13, 5, 0},        // category 2


which kinematic regions are these 4 bins? (to follow up on the discussion during the meeting for why earlier zeroes are now not)

this still requires parsing: what are the two middle elements out of 4 in the category 2?
Is Category 2 defined by radius and z range or by the disk and ring index in the endcap?

Definition of Category 2: (module_layers >= 4) & (module_subdets == 5)
Definition of Eta ranges:
eta_numbers[module_eta < 0.75] = 0
eta_numbers[(module_eta >= 0.75) & (module_eta < 1.5)] = 1
eta_numbers[(module_eta >= 1.5) & (module_eta < 2.25)] = 2
eta_numbers[(module_eta >= 2.25) & (module_eta < 3)] = 3

GNiendorf · 2025-01-26T01:49:23Z

/run all

github-actions · 2025-01-26T01:55:57Z

There was a problem while building and running with CMSSW. The logs can be found here.

github-actions · 2025-01-26T02:03:48Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.6    395.5    187.1    152.4    147.6    549.4    124.5    235.1    151.6      3.6    1993.5    1397.5+/- 386.1     530.6   explicit[s=4] (target branch)
   avg     44.2    394.5    190.0    154.8    141.5    551.4    124.9    235.9    151.0      3.3    1991.6    1396.0+/- 388.1     529.6   explicit[s=4] (this PR)

GNiendorf · 2025-01-26T02:06:03Z

/run standalone lowpt

github-actions · 2025-01-26T02:26:03Z

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     50.1    402.7    580.2    758.5   1048.2   1222.7    290.8   1153.1    359.7      5.6    5871.5    4598.7+/- 1628.6    1527.6   explicit[s=4] (target branch)
   avg     50.5    404.7    578.5    721.4   1132.0   1228.3    293.1   1159.6    363.2      6.3    5937.6    4658.8+/- 1669.7    1536.4   explicit[s=4] (this PR)

GNiendorf · 2025-01-26T23:31:41Z

/run standalone

github-actions · 2025-01-26T23:36:13Z

There was a problem while building and running in standalone mode. The logs can be found here.

GNiendorf · 2025-01-27T16:21:20Z

Moving the dynamic memory allocation to #148

GNiendorf · 2025-01-28T20:16:13Z

I think it makes sense to merge #148 first and then quickly reevaluate the occupancy thresholds. It's likely we can increase the caps now without a huge increase in memory in order to decrease truncation.

GNiendorf · 2025-02-25T21:41:36Z

/run all

github-actions · 2025-02-25T21:58:05Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     32.8    396.4    189.9    151.1    168.9    705.5    130.6    249.3    176.6      2.1    2203.2    1465.0+/- 405.5     577.7   explicit[s=4] (target branch)
   avg     32.1    398.8    188.2    166.8    194.3    700.1    131.4    251.7    177.2      1.8    2242.4    1510.3+/- 435.2     595.4   explicit[s=4] (this PR)

GNiendorf · 2025-02-25T22:02:23Z

/run standalone lowpt

github-actions · 2025-02-25T22:28:11Z

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     35.5    394.2    579.1    780.2   1283.5   1536.7    303.2   1175.7    393.4      3.1    6484.7    4912.5+/- 1769.1    1686.4   explicit[s=4] (target branch)
   avg     35.5    392.9    583.9    827.6   1537.0   1533.2    305.8   1183.5    396.4      4.4    6800.0    5231.4+/- 1998.1    1773.6   explicit[s=4] (this PR)

slava77

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root"
Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?

This should be documented in some way.

slava77 · 2025-02-25T22:41:34Z

RecoTracker/LSTCore/src/alpaka/Segment.h

-          {0, 107, 102, 0},     // category 2
-          {0, 64, 79, 85}       // category 3
+          {740, 314, 230, 60},  // category 0
+          {1097, 693, 0, 0},    // category 1


a factor of 5 increase here: was the target quantile incidentally lower before?

Yes, category 1 here was set to 99% before. I increased it to 99.99% to match the other categories.

slava77 · 2025-02-25T22:45:10Z

RecoTracker/LSTCore/src/alpaka/Triplet.h

-          {0, 0, 0, 0},        // category 2
-          {0, 38, 46, 39}      // category 3
+          {1373, 702, 326, 83},  // category 0
+          {1323, 653, 0, 0},     // category 1


close to x3 increase; similar to T5s: was it a different quantile previously or is this a result of an increase in the LS occupancy?

I updated the percentiles here from 99.9% to 99.99% to match the other objects.

github-actions · 2025-02-25T23:21:12Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

slava77 · 2025-02-25T23:22:30Z

@GNiendorf
please add some memory use analysis similar to #148

slava77 · 2025-02-25T23:28:58Z

looking at pt08 validation results, do I understand correctly that pT5 efficiency goes up quite a bit (5% at high pt)

slava77 · 2025-02-25T23:29:43Z

The full set of validation and comparison plots can be found here.

@ariostas
how long does it usually take for the output files to show up?
it's still 404 after 8 minutes

GNiendorf · 2025-02-25T23:30:45Z

looking at pt08 validation results, do I understand correctly that pT5 efficiency goes up quite a bit (5% at high pt)

Yes, I saw that as well in the performance plots.

GNiendorf · 2025-02-26T00:27:31Z

The upper bound matrices don't have a huge effect on the total memory usage anymore with my latest PR merged in.

0p8 pT Threshold, 1 streams, 500 events:

New - 1615MiB
Current (with dynamic occ PR) - 1423MiB

0p8 pT Threshold, 8 streams, 500 events:

New - 6879MiB
Current (with dynamic occ PR) - 5743MiB

A 13.5% increase for 1 stream and a 20% total memory increase for 8 streams. I think that with the increase in pT5 efficiency and the reduced truncations frequency it is worth it. I saw a reduction in my dynamic occ PR for single stream of ~25%, so we would still have a net decrease in total memory from where we started. Right now we are trying to reduce the number of fakes stored in memory by applying artificially lower occupancies in certain regions and for certain objects. Maybe the cuts for those objects should be reevaluated and tightened instead? I think the T3 DNN will likely reduce the triplet occupancies as well.

Truncations (0p8 pT Threshold, 10 events):

New -

MDs - 107
Segments - 117
Triplets - 3,556
Quints - 0

Current -

MDs - 313
Segments - 16,488
Triplets - 8,400
Quints - 0

slava77 · 2025-02-26T15:23:41Z

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root" Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?

This should be documented in some way.

perhaps a note can be added to the notebook; although just having it in this PR description may be enough.

GNiendorf · 2025-02-26T16:05:48Z

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root" Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?
This should be documented in some way.

perhaps a note can be added to the notebook; although just having it in this PR description may be enough.

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

slava77 · 2025-02-26T16:13:13Z

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

isn't this limited to one step then?
What's stored still matters.
E.g. if MD was truncated (even though the total count was provided) in a given module, the LS in this module will be still truncated.

GNiendorf · 2025-02-26T16:15:31Z

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

isn't this limited to one step then? What's stored still matters. E.g. if MD was truncated (even though the total count was provided) in a given module, the LS in this module will be still truncated.

Oh yeah, good point... I guess that is why the T3's are still truncated so much? I was trying to figure that one out.

slava77 · 2025-02-26T16:15:45Z

minimally, a test would be nice to use a root file made with the latest occupancy limits in the analyzer to check if these don't grow further.

slava77 · 2025-02-26T16:30:05Z

Oh yeah, good point... I guess that is why the T3's are still truncated so much? I was trying to figure that one out.

could be. I wouldn't worry about limits that went up by 20-50%, but the components that went up by x3-6 may be significantly truncated downstream.

GNiendorf · 2025-02-26T23:04:58Z

I did a second pass on the occupancies and the T3 ones go up quite a bit. I'm going to work on my T3 DNN PR for a bit and come back to this, not sure what the optimal solution is here. 1 stream value for 500 events stays constant at ~1615MiB but 8 stream goes up more from 6879MiB->~7280MiB. So that would be a 27% increase in total memory at 8 streams from current 5743MiB. Triplet excesses go down to 1,178, so in total a 95% decrease in truncations for all objects (25,201->1,402 for 10 events).

GNiendorf · 2025-06-11T21:51:54Z

Replaced by #180.

GNiendorf requested a review from slava77 January 24, 2025 21:20

slava77 reviewed Jan 24, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/Quintuplet.h Outdated Show resolved Hide resolved

slava77 reviewed Jan 24, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/analysis/occupancy/compute_occupancies.ipynb Show resolved Hide resolved

slava77 reviewed Jan 24, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/Segment.h Outdated Show resolved Hide resolved

slava77 reviewed Jan 24, 2025

View reviewed changes

GNiendorf force-pushed the occ_revisit branch from 2f6fd57 to 9893cdf Compare January 26, 2025 01:47

GNiendorf changed the title ~~Updated Occupancies~~ Updated Occupancies + Dynamic MDs Allocation Jan 26, 2025

GNiendorf force-pushed the occ_revisit branch from 01dceba to f81b81b Compare January 26, 2025 16:56

GNiendorf changed the title ~~Updated Occupancies + Dynamic MDs Allocation~~ Updated Occupancies Jan 27, 2025

GNiendorf force-pushed the occ_revisit branch from f81b81b to 9893cdf Compare January 27, 2025 16:20

GNiendorf mentioned this pull request Jan 27, 2025

Dynamic Memory Limits for LST Objects #148

Merged

GNiendorf force-pushed the occ_revisit branch from 9893cdf to c457a97 Compare February 14, 2025 16:59

GNiendorf force-pushed the occ_revisit branch from c457a97 to 4387b02 Compare February 25, 2025 17:20

slava77 reviewed Feb 25, 2025

View reviewed changes

GNiendorf force-pushed the occ_revisit branch from b15867d to 6397964 Compare February 26, 2025 23:00

updated occupancies

2a79654

GNiendorf force-pushed the occ_revisit branch from 6397964 to 2a79654 Compare June 7, 2025 20:00

GNiendorf closed this Jun 11, 2025

Updated Occupancies #147

Updated Occupancies #147

Uh oh!

Conversation

GNiendorf commented Jan 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GNiendorf commented Jan 24, 2025

Uh oh!

github-actions bot commented Jan 24, 2025

Uh oh!

GNiendorf commented Jan 24, 2025

Uh oh!

github-actions bot commented Jan 24, 2025

Uh oh!

GNiendorf commented Jan 24, 2025

Uh oh!

ariostas commented Jan 24, 2025

Uh oh!

github-actions bot commented Jan 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slava77 Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

slava77 Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf commented Jan 26, 2025

Uh oh!

github-actions bot commented Jan 26, 2025

Uh oh!

github-actions bot commented Jan 26, 2025

Uh oh!

GNiendorf commented Jan 26, 2025

Uh oh!

github-actions bot commented Jan 26, 2025

Uh oh!

GNiendorf commented Jan 26, 2025

Uh oh!

github-actions bot commented Jan 26, 2025

Uh oh!

GNiendorf commented Jan 27, 2025

Uh oh!

GNiendorf commented Jan 28, 2025

Uh oh!

GNiendorf commented Feb 25, 2025

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

GNiendorf commented Feb 25, 2025

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

slava77 left a comment

Choose a reason for hiding this comment

Uh oh!

slava77 Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

slava77 Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 25, 2025

Uh oh!

GNiendorf commented Jan 24, 2025 •

edited

Loading

GNiendorf Feb 25, 2025 •

edited

Loading

GNiendorf commented Feb 26, 2025 •

edited

Loading

GNiendorf commented Feb 26, 2025 •

edited

Loading