Skip to content

Conversation

@ariostas
Copy link
Member

This is a rebase of SegmentLinking/TrackLooper#412.

It's a very large diff, so @Hoobidoobidoo will have to make sure that it's right. It should be a good starting point to finish the work, so I'll let @Hoobidoobidoo take over. I'm happy to answer questions or help with something if needed.

@slava77
Copy link

slava77 commented Feb 3, 2025

This branch has conflicts that must be resolved

apparently after recent LST PRs

@Hoobidoobidoo
Copy link

I think everything is good on my end to push this pr.

@ariostas
Copy link
Member Author

I'll resolve the conflict and add an option to the build script to enable the extra outputs. I'll let you know so you can test that everything works well.

@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from 417443d to 43311a5 Compare February 24, 2025 16:57
@ariostas
Copy link
Member Author

@Hoobidoobidoo instead of adding additional options and flags, I decided to use the existing CUT_VALUE_DEBUG flag. So to enable the extra outputs you can just use the -d flag. Let me know if this would be an issue for you.

I verified that it compiles (after fixing a couple of typos I made), but could you also verify that the output ntuple contains all the data you want and that it looks good?

@ariostas
Copy link
Member Author

/run all

@github-actions
Copy link

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     31.7    389.0    185.8    150.0    169.1    701.9    129.9    245.7    175.5      1.8    2180.4    1446.7+/- 405.9     574.6   explicit[s=4] (target branch)
   avg     22.7    300.3    147.4    131.1    125.0    377.1     84.5    175.1    120.3      1.4    1484.8    1085.0+/- 327.2    3269.7   explicit[s=4] (this PR)

@slava77
Copy link

slava77 commented Feb 24, 2025

was there a feature change in this PR in the algorithm ( I thought this was just ntuple-related)
why is the efficiency changing?
https://raw.githubusercontent.com/SegmentLinking/TrackLooper-plots-archive/cmssw/PR149_43311a5_standalone/eff_pt_comp.png

@ariostas
Copy link
Member Author

ariostas commented Feb 24, 2025

was there a feature change in this PR in the algorithm ( I thought this was just ntuple-related)
why is the efficiency changing?

No, there was no change in the algorithm. There must be a bug somewhere (or some change in matching tolerance or something like that). The part that writes the ntuple was pretty much completely rewritten, so it would be better if @Hoobidoobidoo or @sgnoohc look into it since they would be more familiar with that. While rebasing it, nothing stood out to me that could cause this.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@ariostas ariostas marked this pull request as ready for review April 1, 2025 13:33
@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from 43311a5 to 72eb0d5 Compare April 1, 2025 13:50
@ariostas
Copy link
Member Author

ariostas commented Apr 1, 2025

/run all

@github-actions
Copy link

github-actions bot commented Apr 1, 2025

There was a problem while building and running in standalone mode. The logs can be found here.

@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from 72eb0d5 to 8c3792d Compare April 1, 2025 14:41
@ariostas
Copy link
Member Author

ariostas commented Apr 1, 2025

/run all

@github-actions
Copy link

github-actions bot commented Apr 1, 2025

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     29.2    379.2    186.4    189.1     46.5    706.3    130.4    139.9    175.5      2.0    1984.5    1249.0+/- 303.9     527.2   explicit[s=4] (target branch)
   avg     20.8    294.1    147.8    148.7     35.9    383.6     83.6    102.0    120.7      1.2    1338.3     934.0+/- 253.2    2904.8   explicit[s=4] (this PR)

@ariostas
Copy link
Member Author

ariostas commented Apr 1, 2025

The plots look fairly reasonable now, but the timing is pretty wacky. I'll look into that

@github-actions
Copy link

github-actions bot commented Apr 1, 2025

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from 8c3792d to 357dc9f Compare April 1, 2025 17:47
@ariostas
Copy link
Member Author

ariostas commented Apr 1, 2025

This is ready for review. I removed the unnecessary printout I mentioned above.

Here is a timing comparison on cgpu-1.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 553.534 ms
Average time for input loading = 7852.47 ms
Average time for lst::Event creation = 0.0034741 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      4.0      0.5      0.4      1.8      0.6      0.5      0.9      0.6      1.1      0.0      10.4       5.9+/-  1.4      12.4   explicit[s=1]
   avg      1.3      0.6      0.6      2.2      0.9      0.7      1.2      0.7      1.5      0.0       9.6       7.7+/-  1.7       5.9   explicit[s=2]
   avg      2.3      1.2      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.1      12.7+/-  3.0       4.7   explicit[s=4]
   avg      4.0      2.0      2.0      4.3      2.6      1.6      3.2      1.6      3.9      0.0      25.2      19.6+/-  5.5       4.6   explicit[s=6]
   avg      5.5      2.7      2.4      5.0      3.7      2.1      4.3      1.9      4.9      0.0      32.5      24.9+/-  6.8       4.5   explicit[s=8]

master (538ac87)
Total Timing Summary
Average time for map loading = 553.863 ms
Average time for input loading = 7808.41 ms
Average time for lst::Event creation = 0.00295185 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      6.1      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0      12.4       5.8+/-  1.1      14.4   explicit[s=1]
   avg      1.3      0.6      0.6      2.2      0.8      0.7      1.2      0.7      1.5      0.0       9.6       7.6+/-  1.5       5.9   explicit[s=2]
   avg      2.3      1.2      1.1      3.1      1.5      1.1      2.1      1.0      2.4      0.0      15.7      12.3+/-  3.0       4.6   explicit[s=4]
   avg      3.7      1.9      1.7      4.0      2.6      1.7      3.2      1.6      3.7      0.0      24.0      18.7+/-  4.7       4.5   explicit[s=6]
   avg      5.6      2.7      2.4      5.3      3.5      2.2      4.2      1.8      4.9      0.0      32.7      24.9+/-  6.9       4.5   explicit[s=8]

@ariostas
Copy link
Member Author

ariostas commented Apr 1, 2025

The cpu timings on cgpu-1 also look good.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 349.674 ms
Average time for input loading = 8143.17 ms
Average time for lst::Event creation = 0.00399635 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     19.8    281.5    141.6    143.3     33.5    363.8     77.8     93.2    107.7      1.1    1263.3     879.7+/- 220.8    1264.3   explicit[s=1]
   avg     19.3    280.3    139.6    146.2     32.8    359.7     76.4     92.6    106.9      0.9    1254.8     875.7+/- 212.7     329.5   explicit[s=4]
   avg     20.8    291.4    145.0    147.8     34.9    370.2     80.0     95.7    109.6      1.1    1296.3     905.3+/- 217.0      94.9   explicit[s=16]
   avg     22.6    295.4    146.6    149.3     35.8    374.6     80.6     97.7    111.2      1.0    1314.9     917.7+/- 220.4      53.8   explicit[s=32]
   avg     32.3    312.6    151.6    158.8     39.1    398.0     84.0    101.8    115.8      1.2    1395.2     964.9+/- 228.6      32.7   explicit[s=64]

master (538ac87)
Total Timing Summary
Average time for map loading = 341.615 ms
Average time for input loading = 200098 ms
Average time for lst::Event creation = 0.00371252 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     18.4    261.5    133.1    132.6     30.5    333.6     70.6     85.6     98.6      0.8    1165.3     813.3+/- 194.7    1166.5   explicit[s=1]
   avg     18.1    260.5    131.9    130.6     30.8    332.8     72.4     85.7     99.1      0.8    1162.8     811.9+/- 194.4     296.0   explicit[s=4]
   avg     20.5    288.7    143.6    144.8     34.4    366.7     79.3     94.7    108.6      1.0    1282.2     895.0+/- 214.1      93.9   explicit[s=16]
   avg     22.6    295.2    147.8    150.2     35.9    374.4     80.7     97.7    111.3      1.0    1316.8     919.8+/- 217.2      53.5   explicit[s=32]
   avg     32.4    312.6    152.0    159.1     39.3    405.7     83.9    102.2    119.3      1.1    1407.5     969.5+/- 230.3      32.8   explicit[s=64]

@slava77
Copy link

slava77 commented Apr 1, 2025

The cpu timings on cgpu-1 also look good.

was this tested with CUT_VALUE_DEBUG enabled?

@slava77
Copy link

slava77 commented Apr 1, 2025

was this tested with CUT_VALUE_DEBUG enabled?

a related question: is there a way to ask for it from the CI /run ...?

@slava77
Copy link

slava77 commented Apr 2, 2025

my review of the /standalone changes was rather cursory.
Perhaps a second look from others would help as well.

It would be good to understand what the impact is for the standard benchmark outputs.
Do I recall correctly that the timing benchmark is done without writing the ntuple?

How much slower is the case with an ntuple when default the validation plots are made (and what is the file size increase in this case)? Same for the case with CUTVALUE_DEBUG enabled.

@ariostas
Copy link
Member Author

ariostas commented Apr 2, 2025

Thanks for all the comments, Slava!

was this tested with CUT_VALUE_DEBUG enabled?

No, but I'll leave timings with it enabled below.

a related question: is there a way to ask for it from the CI /run ...?

No, but if it becomes a common need I can add that. In that case we should also add the option to lst_timing.

Perhaps a second look from others would help as well.

I'll also take a look. While rebasing I fixed some obvious things, but I didn't look at it very carefully.

It would be good to understand what the impact is for the standard benchmark outputs.
Do I recall correctly that the timing benchmark is done without writing the ntuple?

That's right. I'll also run the timing comparison with writing enabled.

I'll address some of the comments you left, but I'll refer the rest to @Hoobidoobidoo or @sgnoohc since they can better address them.

Heres a timing comparison on GPU with CUTVALUE_DEBUG enabled.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 548.513 ms
Average time for input loading = 7668.05 ms
Average time for lst::Event creation = 0.00314485 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      1.3      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0       7.6       5.8+/-  1.1       9.6   explicit_cutvalue[s=1]
   avg      1.3      0.7      0.6      2.2      0.8      0.7      1.2      0.7      1.5      0.0       9.6       7.6+/-  1.6       5.9   explicit_cutvalue[s=2]
   avg      2.3      1.1      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.0      12.7+/-  2.8       4.6   explicit_cutvalue[s=4]
   avg      3.7      1.8      1.7      4.2      2.6      1.6      3.1      1.5      3.8      0.0      23.8      18.6+/-  4.0       4.5   explicit_cutvalue[s=6]
   avg      5.5      2.7      2.4      5.2      3.6      2.0      4.4      2.0      4.9      0.0      32.7      25.3+/-  7.0       4.5   explicit_cutvalue[s=8]

master (538ac87)
Total Timing Summary
Average time for map loading = 547.471 ms
Average time for input loading = 7653.65 ms
Average time for lst::Event creation = 0.00329245 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      1.3      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0       7.6       5.8+/-  1.1       9.6   explicit_cutvalue[s=1]
   avg      1.3      0.7      0.6      2.1      0.8      0.7      1.1      0.7      1.5      0.0       9.5       7.5+/-  1.6       5.9   explicit_cutvalue[s=2]
   avg      2.2      1.2      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.1      12.8+/-  2.8       4.7   explicit_cutvalue[s=4]
   avg      3.5      1.9      1.7      4.1      2.5      1.6      3.2      1.4      3.7      0.0      23.7      18.6+/-  4.7       4.5   explicit_cutvalue[s=6]
   avg      5.5      2.8      2.3      5.1      3.6      2.1      3.9      2.0      4.9      0.0      32.1      24.5+/-  5.7       4.4   explicit_cutvalue[s=8]

@github-actions
Copy link

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     14.5      0.4      0.3      0.6      0.8      0.3      0.6      0.3      0.9      0.0      18.8       3.9+/-  0.7      18.8   explicit[s=1]
   avg      0.9      0.6      0.4      0.8      1.0      0.3      1.0      0.5      1.3      0.0       6.7       5.5+/-  1.0       3.3   explicit[s=2]
   avg      1.5      0.9      0.8      1.3      1.5      0.5      1.7      0.7      2.0      0.0      10.8       8.9+/-  1.8       2.8   explicit[s=4]
   avg      2.2      1.3      1.1      1.8      2.2      0.7      2.4      1.0      2.8      0.0      15.5      12.6+/-  3.0       2.6   explicit[s=6]
   avg      2.9      1.8      1.6      2.3      2.9      0.8      3.0      1.4      3.5      0.0      20.1      16.4+/-  3.6       2.6   explicit[s=8]
[this PR]
   avg     14.5      0.4      0.3      0.6      0.8      0.3      0.6      0.4      0.9      0.0      18.7       3.9+/-  0.7      18.7   explicit[s=1]
   avg      0.9      0.6      0.4      0.8      1.0      0.3      1.0      0.4      1.3      0.0       6.7       5.5+/-  1.0       6.7   explicit[s=2]
   avg      1.5      0.9      0.7      1.3      1.6      0.4      1.7      0.7      2.0      0.0      10.9       8.9+/-  1.7       2.7   explicit[s=4]
   avg      2.2      1.3      1.1      1.8      2.2      0.6      2.3      1.0      2.8      0.0      15.3      12.5+/-  2.6       5.2   explicit[s=6]
   avg      2.8      1.7      1.6      2.4      2.9      0.8      3.1      1.3      3.5      0.0      20.2      16.5+/-  3.8       5.1   explicit[s=8]

@slava77
Copy link

slava77 commented Aug 27, 2025

is the GPU more costly now at high stream count or is there some stability issue on the CI side?
image

(bottom is with this PR)

@GNiendorf
Copy link
Member

GNiendorf commented Aug 27, 2025

Timing looks good on lnx4555
Screenshot 2025-08-27 at 7 48 47 PM

Previous master (re-ran just now)
Screenshot 2025-08-27 at 8 01 19 PM

@github-actions
Copy link

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from d94865f to fd11107 Compare August 28, 2025 14:11
@ariostas
Copy link
Member Author

I fixed the conflicts with master and squashed. I'll run the CI again and do some timing tests on cgpu-1 to make sure it was a fluke.

/run gpu-all

@ariostas
Copy link
Member Author

I'll have to look into this further because it seems to get stuck on event 129 when all outputs are enabled.

@github-actions
Copy link

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     15.4      0.4      0.4      0.8      1.0      0.3      0.6      0.3      0.9      0.0      20.2       4.5+/-  1.2      20.3   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.4      1.3      0.0       7.4       6.1+/-  1.5       3.7   explicit[s=2]
   avg      1.6      0.9      1.0      1.6      1.8      0.4      1.6      0.7      2.0      0.0      11.7       9.6+/-  2.2       3.0   explicit[s=4]
   avg      2.3      1.3      1.5      2.2      2.6      0.6      2.2      0.9      2.8      0.0      16.4      13.5+/-  2.7       2.8   explicit[s=6]
   avg      3.0      1.8      2.1      3.0      3.3      0.7      2.7      1.3      3.5      0.0      21.5      17.7+/-  3.3       2.7   explicit[s=8]
[this PR]
   avg     15.3      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.1       4.5+/-  1.2      20.1   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.4      1.3      0.0       7.3       6.2+/-  1.6       3.7   explicit[s=2]
   avg      1.5      0.9      1.0      1.6      1.9      0.4      1.7      0.7      2.0      0.0      11.7       9.7+/-  2.3       3.0   explicit[s=4]
   avg      2.2      1.3      1.5      2.2      2.5      0.6      2.3      0.9      2.7      0.0      16.3      13.5+/-  2.8       2.8   explicit[s=6]
   avg      2.9      1.9      2.1      3.0      3.2      0.8      2.7      1.3      3.5      0.0      21.4      17.7+/-  3.3       2.7   explicit[s=8]

@ariostas
Copy link
Member Author

I looked into it and it turns out that it's because filling the t3dnn branches hangs for a while for event 129 since there is one module with 2156 T3s. But the slowdown also happens on master, so it's not an issue here.

Here's a timing comparison on cgpu-1 for the first 100 events of PU200.

This PR (all outputs enabled)
Total Timing Summary
Average time for map loading = 599.407 ms
Average time for input loading = 4548.81 ms
Average time for lst::Event creation = 0.000794729 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     25.8      0.5      0.5      1.0      1.2      0.7      0.9      0.4      1.1      0.1      32.2       5.7+/-  1.8    3820.6   explicit_cutvalue[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      1.0      0.5      1.1      0.1       8.2       5.9+/-  2.2    3822.1   explicit_cutvalue[s=2]
   avg      3.5      0.6      0.6      1.1      1.3      0.8      1.4      0.6      1.2      0.1      11.3       6.9+/-  5.1    3842.1   explicit_cutvalue[s=4]
   avg      4.6      0.7      0.7      1.3      1.5      0.7      1.5      0.9      1.3      0.1      13.4       8.0+/-  8.0    3912.7   explicit_cutvalue[s=6]
   avg      5.1      0.9      0.8      1.3      1.6      0.8      2.6      0.9      1.5      0.1      15.5       9.6+/- 12.0    3809.8   explicit_cutvalue[s=8]

master (all outputs enabled)
Total Timing Summary
Average time for map loading = 606.334 ms
Average time for input loading = 4652 ms
Average time for lst::Event creation = 0.00113533 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     28.7      0.5      0.5      1.0      1.2      0.7      0.9      0.4      1.1      0.1      35.1       5.7+/-  1.8    2147.5   explicit_cutvalue[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      1.0      0.5      1.2      0.1       8.4       6.0+/-  2.5    2060.4   explicit_cutvalue[s=2]
   avg      2.3      0.5      0.6      1.1      1.3      0.7      1.2      0.6      1.2      0.1       9.6       6.6+/-  4.9    2005.6   explicit_cutvalue[s=4]
   avg      4.3      0.7      0.7      1.2      1.4      0.8      1.9      0.7      1.3      0.1      13.2       8.0+/-  8.4    2086.6   explicit_cutvalue[s=6]
   avg      5.2      0.8      0.8      1.4      1.5      0.8      2.1      1.0      1.5      0.1      15.2       9.2+/- 11.0    2089.0   explicit_cutvalue[s=8]

This PR (standard outputs enabled)
Total Timing Summary
Average time for map loading = 593.198 ms
Average time for input loading = 4504.92 ms
Average time for lst::Event creation = 0.000749315 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     25.7      0.5      0.5      1.0      1.1      0.7      0.9      0.4      1.1      0.0      31.9       5.6+/-  1.8     180.3   explicit[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      0.9      0.5      1.1      0.0       8.1       5.8+/-  2.3     163.7   explicit[s=2]
   avg      2.4      0.5      0.6      1.0      1.2      0.7      1.1      0.6      1.2      0.0       9.4       6.4+/-  4.3     165.4   explicit[s=4]
   avg      3.4      0.7      0.7      1.2      1.4      0.8      1.5      0.7      1.3      0.0      11.6       7.4+/-  7.3     165.4   explicit[s=6]
   avg      4.7      0.9      0.7      1.4      1.5      0.8      2.1      0.9      1.5      0.0      14.5       9.0+/- 11.4     165.8   explicit[s=8]

master (standard outputs enabled)
Total Timing Summary
Average time for map loading = 609.423 ms
Average time for input loading = 4481.9 ms
Average time for lst::Event creation = 0.000703903 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     26.1      0.5      0.5      0.9      1.1      0.7      0.9      0.5      1.1      0.0      32.3       5.6+/-  1.8     120.3   explicit[s=1]
   avg      4.7      0.5      0.6      1.0      1.2      0.7      0.9      0.5      1.1      0.0      11.2       5.8+/-  2.2      92.4   explicit[s=2]
   avg      2.7      0.6      0.6      1.0      1.2      0.7      1.1      0.6      1.2      0.0       9.9       6.4+/-  4.3      92.1   explicit[s=4]
   avg      3.4      0.7      0.7      1.2      1.3      0.7      1.4      0.6      1.3      0.0      11.4       7.3+/-  6.8      93.4   explicit[s=6]
   avg      4.8      0.9      0.8      1.3      1.5      0.8      1.5      0.9      1.4      0.0      14.0       8.4+/-  9.4      92.0   explicit[s=8]

There is a noticeable slowdown even with the default outputs. Let me know if you think that more things should be toggled by a flag instead of being written by default.

@GNiendorf
Copy link
Member

GNiendorf commented Aug 28, 2025

Where is the timing increase coming from? Are we storing more info into the standard ntuple, or is the code just slower? If it is the former, don't we have an -l flag that toggles saving low-level info in the ntuple? We could put the new branches for the low-level objects under that flag so we don't compute/store them by default. edit: Or just don’t have —allobj as the default, save only the final objects.

@ariostas
Copy link
Member Author

The output ntuple is still more than 4 times larger, so that must be it. I'll see which branches could be put behind flags so that it doesn't save so much stuff by default.

@aashayarora
Copy link

Hello, I was running this branch on the PU200RelVal sample on cgpu-1 using the CPU backend and after around 1 hour, the memory usage for the process was around 250GB. I killed it so the system wouldn't run OOM. It seems there might be a memory leak somewhere.

@GNiendorf
Copy link
Member

Added some small comments, not sure if any are related to memory issue above.

@ariostas
Copy link
Member Author

Thanks for catching those issues/typos, @GNiendorf! I'll go through the changes slowly to make sure there are no more surprises.

@aashayarora were you running with --allobj or just with the standard outputs?

@aashayarora
Copy link

@aashayarora were you running with --allobj or just with the standard outputs?

I was running with the --md and --ls flags.
The exact command I ran is lst -i PU200RelVal -s 32 -o output_pu200.root --md --ls

@ariostas
Copy link
Member Author

Hmm there does seem to be a memory leak. It also seems to already be present in master, but it is worse in this branch, probably because it's saving more stuff. I suspect that the issue is with rooutil.

@ariostas
Copy link
Member Author

Actually, seems like the issue is with #168. I'll look into it.

ariostas and others added 2 commits August 29, 2025 18:42
Co-authored-by: Philip Chang <[email protected]>
Co-authored-by: Hubert Pugzlys <[email protected]>
Co-authored-by: Gavin Niendorf <[email protected]>
@ariostas ariostas force-pushed the ariostas/lstod-rebase branch from fd11107 to d0b375b Compare August 29, 2025 18:44
@ariostas
Copy link
Member Author

Okay, hopefully everything is good now.

I fixed the issues Gavin pointed out, moved some extra branches behind flags, and fixed the memory leak.

Here's the new timing for the standard outputs on cgpu-1. The ntuple is now only about 10% larger.

This PR
Total Timing Summary
Average time for map loading = 582.945 ms
Average time for input loading = 18068.4 ms
Average time for lst::Event creation = 0.000579016 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     16.7      0.4      0.5      1.0      1.1      0.7      0.9      0.5      1.1      0.0      23.0       5.6+/-  1.6      23.0   explicit[s=1]
   avg      2.6      0.9      0.8      1.4      1.4      0.7      1.5      0.8      1.8      0.0      12.0       8.6+/-  2.1       6.0   explicit[s=2]
   avg      3.0      1.7      1.9      2.7      2.5      1.0      2.6      1.1      3.1      0.0      19.6      15.6+/-  2.8       5.0   explicit[s=4]
   avg      4.1      2.4      3.0      3.8      4.1      1.5      4.0      1.7      4.6      0.0      29.2      23.7+/-  5.0       5.0   explicit[s=6]
   avg      5.2      3.5      4.2      5.5      5.5      1.7      5.1      2.4      5.8      0.0      38.9      32.0+/-  6.8       5.0   explicit[s=8]

master
Total Timing Summary
Average time for map loading = 579.196 ms
Average time for input loading = 7751.67 ms
Average time for lst::Event creation = 0.00055631 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     15.9      0.5      0.5      1.0      1.1      0.7      0.9      0.5      1.1      0.0      22.1       5.6+/-  1.6      22.1   explicit[s=1]
   avg      2.6      0.9      0.9      1.4      1.5      0.7      1.5      0.8      1.7      0.0      12.0       8.7+/-  2.0       6.1   explicit[s=2]
   avg      2.9      1.6      1.9      2.6      2.6      1.0      2.6      1.2      3.1      0.0      19.5      15.6+/-  3.1       5.0   explicit[s=4]
   avg      4.0      2.6      2.9      3.9      4.1      1.4      4.0      1.7      4.6      0.0      29.2      23.8+/-  5.1       5.0   explicit[s=6]
   avg      5.1      3.4      4.1      5.5      5.6      1.7      4.7      2.4      5.8      0.0      38.3      31.6+/-  4.4       4.9   explicit[s=8]

/run gpu-all

@github-actions
Copy link

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

@github-actions
Copy link

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     15.2      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.0       4.5+/-  1.2      20.1   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.5      1.2      0.0       7.4       6.2+/-  1.5       3.7   explicit[s=2]
   avg      1.8      0.9      1.0      1.6      1.8      0.4      1.7      0.7      2.0      0.0      11.9       9.7+/-  2.2       3.1   explicit[s=4]
   avg      3.1      1.3      1.5      2.2      2.6      0.6      2.3      1.0      2.7      0.0      17.4      13.6+/-  3.4       3.0   explicit[s=6]
   avg      3.4      1.8      2.1      2.8      3.3      0.8      3.0      1.3      3.4      0.0      22.1      17.9+/-  4.1       5.6   explicit[s=8]
[this PR]
   avg     15.3      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.0       4.5+/-  1.2      20.1   explicit[s=1]
   avg      1.0      0.6      0.6      1.0      1.2      0.3      1.0      0.5      1.3      0.0       7.4       6.1+/-  1.5       3.7   explicit[s=2]
   avg      1.9      0.9      1.0      1.6      1.9      0.4      1.7      0.7      2.0      0.0      12.0       9.7+/-  2.3       3.1   explicit[s=4]
   avg      2.8      1.3      1.6      2.2      2.5      0.6      2.2      1.0      2.8      0.0      17.1      13.6+/-  3.1       2.9   explicit[s=6]
   avg      4.2      1.9      2.2      2.9      3.2      0.7      2.7      1.2      3.5      0.0      22.5      17.5+/-  3.5       2.9   explicit[s=8]

@github-actions github-actions bot merged commit 27c51f0 into master Sep 16, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants