Skip to content

Conversation

@GNiendorf
Copy link
Member

@GNiendorf GNiendorf commented Jan 24, 2025

Updated occupancies for all LST objects, and small updates to the notebook for printing them.

The upper bound matrices don't have a huge effect on the total memory usage anymore with my latest PR merged in.

0p8 pT Threshold, 1 streams, 500 events:

New - 1615MiB
Current (with dynamic occ PR) - 1423MiB

0p8 pT Threshold, 8 streams, 500 events:

New - 6879MiB
Current (with dynamic occ PR) - 5743MiB

A 13.5% increase for 1 stream and a 20% total memory increase for 8 streams. I think that with the increase in pT5 efficiency and the reduced truncations frequency it is worth it. I saw a reduction in my dynamic occ PR for single stream of ~25%, so we would still have a net decrease in total memory from where we started. Right now we are trying to reduce the number of fakes stored in memory by applying artificially lower occupancies in certain regions and for certain objects. Maybe the cuts for those objects should be reevaluated and tightened instead? I think the T3 DNN will likely reduce the triplet occupancies as well.

Truncations (0p8 pT Threshold, 10 events):

New -

MDs - 107
Segments - 117
Triplets - 3,556
Quints - 0

Current -

MDs - 313
Segments - 16,488
Triplets - 8,400
Quints - 0

@GNiendorf
Copy link
Member Author

/run all

@GNiendorf GNiendorf requested a review from slava77 January 24, 2025 21:20
@github-actions
Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@GNiendorf
Copy link
Member Author

@ariostas Seems like an issue with the CMSSW tests? Something about alpaka math?

@github-actions
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.0    401.4    188.0    150.9    146.0    551.4    122.8    233.0    151.3      3.7    1994.5    1397.1+/- 387.4     528.0   explicit[s=4] (target branch)
   avg     43.0    393.4    188.3    161.6    149.4    549.3    123.6    231.1    150.6      3.2    1993.6    1401.3+/- 390.3     529.1   explicit[s=4] (this PR)

@GNiendorf
Copy link
Member Author

/run standalone lowpt

@ariostas
Copy link
Member

Oh that's a package that was recently introduced by Manos. You could cherry-pick that commit or see if by early next week the CMSSW PR finally gets merged so that you can rebase

@github-actions
Copy link

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     51.0    407.0    591.2    761.4   1052.4   1228.5    294.0   1152.9    360.3      5.0    5903.7    4624.2+/- 1634.9    1533.7   explicit[s=4] (target branch)
   avg     51.7    403.2    586.2    788.8   1215.9   1222.9    293.5   1161.9    367.7      6.4    6098.2    4823.6+/- 1758.5    1590.7   explicit[s=4] (this PR)

{0, 38, 46, 39} // category 3
{668, 271, 105, 59}, // category 0
{738, 310, 0, 0}, // category 1
{0, 13, 5, 0}, // category 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which kinematic regions are these 4 bins? (to follow up on the discussion during the meeting for why earlier zeroes are now not)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-01-24 at 6 06 58 PM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this still requires parsing: what are the two middle elements out of 4 in the category 2?
Is Category 2 defined by radius and z range or by the disk and ring index in the endcap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2025-01-24 at 6 57 09 PM

Definition of Category 2: (module_layers >= 4) & (module_subdets == 5)
Definition of Eta ranges:
eta_numbers[module_eta < 0.75] = 0
eta_numbers[(module_eta >= 0.75) & (module_eta < 1.5)] = 1
eta_numbers[(module_eta >= 1.5) & (module_eta < 2.25)] = 2
eta_numbers[(module_eta >= 2.25) & (module_eta < 3)] = 3

@GNiendorf
Copy link
Member Author

/run all

@github-actions
Copy link

There was a problem while building and running with CMSSW. The logs can be found here.

@github-actions
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     46.6    395.5    187.1    152.4    147.6    549.4    124.5    235.1    151.6      3.6    1993.5    1397.5+/- 386.1     530.6   explicit[s=4] (target branch)
   avg     44.2    394.5    190.0    154.8    141.5    551.4    124.9    235.9    151.0      3.3    1991.6    1396.0+/- 388.1     529.6   explicit[s=4] (this PR)

@GNiendorf
Copy link
Member Author

/run standalone lowpt

@github-actions
Copy link

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     50.1    402.7    580.2    758.5   1048.2   1222.7    290.8   1153.1    359.7      5.6    5871.5    4598.7+/- 1628.6    1527.6   explicit[s=4] (target branch)
   avg     50.5    404.7    578.5    721.4   1132.0   1228.3    293.1   1159.6    363.2      6.3    5937.6    4658.8+/- 1669.7    1536.4   explicit[s=4] (this PR)

@GNiendorf GNiendorf changed the title Updated Occupancies Updated Occupancies + Dynamic MDs Allocation Jan 26, 2025
@GNiendorf
Copy link
Member Author

/run standalone

@github-actions
Copy link

There was a problem while building and running in standalone mode. The logs can be found here.

@GNiendorf GNiendorf changed the title Updated Occupancies + Dynamic MDs Allocation Updated Occupancies Jan 27, 2025
@GNiendorf
Copy link
Member Author

Moving the dynamic memory allocation to #148

@GNiendorf
Copy link
Member Author

I think it makes sense to merge #148 first and then quickly reevaluate the occupancy thresholds. It's likely we can increase the caps now without a huge increase in memory in order to decrease truncation.

@GNiendorf
Copy link
Member Author

/run all

@github-actions
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     32.8    396.4    189.9    151.1    168.9    705.5    130.6    249.3    176.6      2.1    2203.2    1465.0+/- 405.5     577.7   explicit[s=4] (target branch)
   avg     32.1    398.8    188.2    166.8    194.3    700.1    131.4    251.7    177.2      1.8    2242.4    1510.3+/- 435.2     595.4   explicit[s=4] (this PR)

@GNiendorf
Copy link
Member Author

/run standalone lowpt

@github-actions
Copy link

The PR was built and ran successfully in standalone mode (low pT setup). Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     35.5    394.2    579.1    780.2   1283.5   1536.7    303.2   1175.7    393.4      3.1    6484.7    4912.5+/- 1769.1    1686.4   explicit[s=4] (target branch)
   avg     35.5    392.9    583.9    827.6   1537.0   1533.2    305.8   1183.5    396.4      4.4    6800.0    5231.4+/- 1998.1    1773.6   explicit[s=4] (this PR)

Copy link

@slava77 slava77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root"
Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?

This should be documented in some way.

{0, 107, 102, 0}, // category 2
{0, 64, 79, 85} // category 3
{740, 314, 230, 60}, // category 0
{1097, 693, 0, 0}, // category 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a factor of 5 increase here: was the target quantile incidentally lower before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, category 1 here was set to 99% before. I increased it to 99.99% to match the other categories.

{0, 0, 0, 0}, // category 2
{0, 38, 46, 39} // category 3
{1373, 702, 326, 83}, // category 0
{1323, 653, 0, 0}, // category 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close to x3 increase; similar to T5s: was it a different quantile previously or is this a result of an increase in the LS occupancy?

Copy link
Member Author

@GNiendorf GNiendorf Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the percentiles here from 99.9% to 99.99% to match the other objects.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@slava77
Copy link

slava77 commented Feb 25, 2025

@GNiendorf
please add some memory use analysis similar to #148

@slava77
Copy link

slava77 commented Feb 25, 2025

looking at pt08 validation results, do I understand correctly that pT5 efficiency goes up quite a bit (5% at high pt)

image

@slava77
Copy link

slava77 commented Feb 25, 2025

The full set of validation and comparison plots can be found here.

@ariostas
how long does it usually take for the output files to show up?
it's still 404 after 8 minutes

@GNiendorf
Copy link
Member Author

looking at pt08 validation results, do I understand correctly that pT5 efficiency goes up quite a bit (5% at high pt)

image

Yes, I saw that as well in the performance plots.

@GNiendorf
Copy link
Member Author

GNiendorf commented Feb 26, 2025

The upper bound matrices don't have a huge effect on the total memory usage anymore with my latest PR merged in.

0p8 pT Threshold, 1 streams, 500 events:

New - 1615MiB
Current (with dynamic occ PR) - 1423MiB

0p8 pT Threshold, 8 streams, 500 events:

New - 6879MiB
Current (with dynamic occ PR) - 5743MiB

A 13.5% increase for 1 stream and a 20% total memory increase for 8 streams. I think that with the increase in pT5 efficiency and the reduced truncations frequency it is worth it. I saw a reduction in my dynamic occ PR for single stream of ~25%, so we would still have a net decrease in total memory from where we started. Right now we are trying to reduce the number of fakes stored in memory by applying artificially lower occupancies in certain regions and for certain objects. Maybe the cuts for those objects should be reevaluated and tightened instead? I think the T3 DNN will likely reduce the triplet occupancies as well.

Truncations (0p8 pT Threshold, 10 events):

New -

MDs - 107
Segments - 117
Triplets - 3,556
Quints - 0

Current -

MDs - 313
Segments - 16,488
Triplets - 8,400
Quints - 0

@slava77
Copy link

slava77 commented Feb 26, 2025

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root" Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?

This should be documented in some way.

perhaps a note can be added to the notebook; although just having it in this PR description may be enough.

@GNiendorf
Copy link
Member Author

Please remind to me the logistics of making the input analysis: "500_new_occ_0p8.root" Was it made with some xNNN or more relaxed limits to get an (almost) unbiased input; or is there an option where a full n*m (inner times outer actual untrancated array is allocated) ?
This should be documented in some way.

perhaps a note can be added to the notebook; although just having it in this PR description may be enough.

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

@slava77
Copy link

slava77 commented Feb 26, 2025

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

isn't this limited to one step then?
What's stored still matters.
E.g. if MD was truncated (even though the total count was provided) in a given module, the LS in this module will be still truncated.

@GNiendorf
Copy link
Member Author

All you need to do is compile the code with the -d option. The variables used to determine the occupancies are incremented if an object passes all selections, regardless of whether it is stored or not.

isn't this limited to one step then? What's stored still matters. E.g. if MD was truncated (even though the total count was provided) in a given module, the LS in this module will be still truncated.

Oh yeah, good point... I guess that is why the T3's are still truncated so much? I was trying to figure that one out.

@slava77
Copy link

slava77 commented Feb 26, 2025

minimally, a test would be nice to use a root file made with the latest occupancy limits in the analyzer to check if these don't grow further.

@slava77
Copy link

slava77 commented Feb 26, 2025

Oh yeah, good point... I guess that is why the T3's are still truncated so much? I was trying to figure that one out.

could be. I wouldn't worry about limits that went up by 20-50%, but the components that went up by x3-6 may be significantly truncated downstream.

@GNiendorf
Copy link
Member Author

GNiendorf commented Feb 26, 2025

I did a second pass on the occupancies and the T3 ones go up quite a bit. I'm going to work on my T3 DNN PR for a bit and come back to this, not sure what the optimal solution is here. 1 stream value for 500 events stays constant at ~1615MiB but 8 stream goes up more from 6879MiB->~7280MiB. So that would be a 27% increase in total memory at 8 streams from current 5743MiB. Triplet excesses go down to 1,178, so in total a 95% decrease in truncations for all objects (25,201->1,402 for 10 events).

@GNiendorf
Copy link
Member Author

Replaced by #180.

@GNiendorf GNiendorf closed this Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants