Rebase LSTOD #149

ariostas · 2025-01-28T16:58:05Z

This is a rebase of SegmentLinking/TrackLooper#412.

It's a very large diff, so @Hoobidoobidoo will have to make sure that it's right. It should be a good starting point to finish the work, so I'll let @Hoobidoobidoo take over. I'm happy to answer questions or help with something if needed.

slava77 · 2025-02-03T16:28:01Z

This branch has conflicts that must be resolved

apparently after recent LST PRs

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

Hoobidoobidoo · 2025-02-24T15:26:52Z

I think everything is good on my end to push this pr.

ariostas · 2025-02-24T15:52:28Z

I'll resolve the conflict and add an option to the build script to enable the extra outputs. I'll let you know so you can test that everything works well.

ariostas · 2025-02-24T17:01:05Z

@Hoobidoobidoo instead of adding additional options and flags, I decided to use the existing CUT_VALUE_DEBUG flag. So to enable the extra outputs you can just use the -d flag. Let me know if this would be an issue for you.

I verified that it compiles (after fixing a couple of typos I made), but could you also verify that the output ntuple contains all the data you want and that it looks good?

ariostas · 2025-02-24T17:07:09Z

/run all

github-actions · 2025-02-24T17:33:55Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     31.7    389.0    185.8    150.0    169.1    701.9    129.9    245.7    175.5      1.8    2180.4    1446.7+/- 405.9     574.6   explicit[s=4] (target branch)
   avg     22.7    300.3    147.4    131.1    125.0    377.1     84.5    175.1    120.3      1.4    1484.8    1085.0+/- 327.2    3269.7   explicit[s=4] (this PR)

slava77 · 2025-02-24T18:13:39Z

was there a feature change in this PR in the algorithm ( I thought this was just ntuple-related)
why is the efficiency changing?
https://raw.githubusercontent.com/SegmentLinking/TrackLooper-plots-archive/cmssw/PR149_43311a5_standalone/eff_pt_comp.png

ariostas · 2025-02-24T18:59:26Z

was there a feature change in this PR in the algorithm ( I thought this was just ntuple-related)
why is the efficiency changing?

No, there was no change in the algorithm. There must be a bug somewhere (or some change in matching tolerance or something like that). The part that writes the ntuple was pretty much completely rewritten, so it would be better if @Hoobidoobidoo or @sgnoohc look into it since they would be more familiar with that. While rebasing it, nothing stood out to me that could cause this.

github-actions · 2025-02-24T19:20:22Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

ariostas · 2025-04-01T13:51:40Z

/run all

github-actions · 2025-04-01T13:59:32Z

There was a problem while building and running in standalone mode. The logs can be found here.

ariostas · 2025-04-01T14:41:20Z

/run all

github-actions · 2025-04-01T15:06:37Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     29.2    379.2    186.4    189.1     46.5    706.3    130.4    139.9    175.5      2.0    1984.5    1249.0+/- 303.9     527.2   explicit[s=4] (target branch)
   avg     20.8    294.1    147.8    148.7     35.9    383.6     83.6    102.0    120.7      1.2    1338.3     934.0+/- 253.2    2904.8   explicit[s=4] (this PR)

RecoTracker/LSTCore/standalone/efficiency/src/performance.cc

ariostas · 2025-04-01T15:11:51Z

The plots look fairly reasonable now, but the timing is pretty wacky. I'll look into that

github-actions · 2025-04-01T16:31:58Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

ariostas · 2025-04-01T17:51:09Z

This is ready for review. I removed the unnecessary printout I mentioned above.

Here is a timing comparison on cgpu-1.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 553.534 ms
Average time for input loading = 7852.47 ms
Average time for lst::Event creation = 0.0034741 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      4.0      0.5      0.4      1.8      0.6      0.5      0.9      0.6      1.1      0.0      10.4       5.9+/-  1.4      12.4   explicit[s=1]
   avg      1.3      0.6      0.6      2.2      0.9      0.7      1.2      0.7      1.5      0.0       9.6       7.7+/-  1.7       5.9   explicit[s=2]
   avg      2.3      1.2      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.1      12.7+/-  3.0       4.7   explicit[s=4]
   avg      4.0      2.0      2.0      4.3      2.6      1.6      3.2      1.6      3.9      0.0      25.2      19.6+/-  5.5       4.6   explicit[s=6]
   avg      5.5      2.7      2.4      5.0      3.7      2.1      4.3      1.9      4.9      0.0      32.5      24.9+/-  6.8       4.5   explicit[s=8]

master (538ac87)
Total Timing Summary
Average time for map loading = 553.863 ms
Average time for input loading = 7808.41 ms
Average time for lst::Event creation = 0.00295185 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      6.1      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0      12.4       5.8+/-  1.1      14.4   explicit[s=1]
   avg      1.3      0.6      0.6      2.2      0.8      0.7      1.2      0.7      1.5      0.0       9.6       7.6+/-  1.5       5.9   explicit[s=2]
   avg      2.3      1.2      1.1      3.1      1.5      1.1      2.1      1.0      2.4      0.0      15.7      12.3+/-  3.0       4.6   explicit[s=4]
   avg      3.7      1.9      1.7      4.0      2.6      1.7      3.2      1.6      3.7      0.0      24.0      18.7+/-  4.7       4.5   explicit[s=6]
   avg      5.6      2.7      2.4      5.3      3.5      2.2      4.2      1.8      4.9      0.0      32.7      24.9+/-  6.9       4.5   explicit[s=8]

ariostas · 2025-04-01T19:29:55Z

The cpu timings on cgpu-1 also look good.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 349.674 ms
Average time for input loading = 8143.17 ms
Average time for lst::Event creation = 0.00399635 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     19.8    281.5    141.6    143.3     33.5    363.8     77.8     93.2    107.7      1.1    1263.3     879.7+/- 220.8    1264.3   explicit[s=1]
   avg     19.3    280.3    139.6    146.2     32.8    359.7     76.4     92.6    106.9      0.9    1254.8     875.7+/- 212.7     329.5   explicit[s=4]
   avg     20.8    291.4    145.0    147.8     34.9    370.2     80.0     95.7    109.6      1.1    1296.3     905.3+/- 217.0      94.9   explicit[s=16]
   avg     22.6    295.4    146.6    149.3     35.8    374.6     80.6     97.7    111.2      1.0    1314.9     917.7+/- 220.4      53.8   explicit[s=32]
   avg     32.3    312.6    151.6    158.8     39.1    398.0     84.0    101.8    115.8      1.2    1395.2     964.9+/- 228.6      32.7   explicit[s=64]

master (538ac87)
Total Timing Summary
Average time for map loading = 341.615 ms
Average time for input loading = 200098 ms
Average time for lst::Event creation = 0.00371252 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     18.4    261.5    133.1    132.6     30.5    333.6     70.6     85.6     98.6      0.8    1165.3     813.3+/- 194.7    1166.5   explicit[s=1]
   avg     18.1    260.5    131.9    130.6     30.8    332.8     72.4     85.7     99.1      0.8    1162.8     811.9+/- 194.4     296.0   explicit[s=4]
   avg     20.5    288.7    143.6    144.8     34.4    366.7     79.3     94.7    108.6      1.0    1282.2     895.0+/- 214.1      93.9   explicit[s=16]
   avg     22.6    295.2    147.8    150.2     35.9    374.4     80.7     97.7    111.3      1.0    1316.8     919.8+/- 217.2      53.5   explicit[s=32]
   avg     32.4    312.6    152.0    159.1     39.3    405.7     83.9    102.2    119.3      1.1    1407.5     969.5+/- 230.3      32.8   explicit[s=64]

slava77 · 2025-04-01T23:17:52Z

The cpu timings on cgpu-1 also look good.

was this tested with CUT_VALUE_DEBUG enabled?

slava77 · 2025-04-01T23:20:01Z

was this tested with CUT_VALUE_DEBUG enabled?

a related question: is there a way to ask for it from the CI /run ...?

RecoTracker/LSTCore/src/alpaka/Segment.h

RecoTracker/LSTCore/standalone/code/core/trkCore.cc

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

RecoTracker/LSTCore/standalone/efficiency/src/LSTEff.cc

RecoTracker/LSTCore/standalone/eventdisplay/environment.yml

RecoTracker/LSTCore/standalone/efficiency/src/LSTEff.h

slava77 · 2025-04-02T13:13:31Z

my review of the /standalone changes was rather cursory.
Perhaps a second look from others would help as well.

It would be good to understand what the impact is for the standard benchmark outputs.
Do I recall correctly that the timing benchmark is done without writing the ntuple?

How much slower is the case with an ntuple when default the validation plots are made (and what is the file size increase in this case)? Same for the case with CUTVALUE_DEBUG enabled.

ariostas · 2025-04-02T14:01:49Z

Thanks for all the comments, Slava!

was this tested with CUT_VALUE_DEBUG enabled?

No, but I'll leave timings with it enabled below.

a related question: is there a way to ask for it from the CI /run ...?

No, but if it becomes a common need I can add that. In that case we should also add the option to lst_timing.

Perhaps a second look from others would help as well.

I'll also take a look. While rebasing I fixed some obvious things, but I didn't look at it very carefully.

It would be good to understand what the impact is for the standard benchmark outputs.
Do I recall correctly that the timing benchmark is done without writing the ntuple?

That's right. I'll also run the timing comparison with writing enabled.

I'll address some of the comments you left, but I'll refer the rest to @Hoobidoobidoo or @sgnoohc since they can better address them.

Heres a timing comparison on GPU with CUTVALUE_DEBUG enabled.

This PR (357dc9f)
Total Timing Summary
Average time for map loading = 548.513 ms
Average time for input loading = 7668.05 ms
Average time for lst::Event creation = 0.00314485 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      1.3      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0       7.6       5.8+/-  1.1       9.6   explicit_cutvalue[s=1]
   avg      1.3      0.7      0.6      2.2      0.8      0.7      1.2      0.7      1.5      0.0       9.6       7.6+/-  1.6       5.9   explicit_cutvalue[s=2]
   avg      2.3      1.1      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.0      12.7+/-  2.8       4.6   explicit_cutvalue[s=4]
   avg      3.7      1.8      1.7      4.2      2.6      1.6      3.1      1.5      3.8      0.0      23.8      18.6+/-  4.0       4.5   explicit_cutvalue[s=6]
   avg      5.5      2.7      2.4      5.2      3.6      2.0      4.4      2.0      4.9      0.0      32.7      25.3+/-  7.0       4.5   explicit_cutvalue[s=8]

master (538ac87)
Total Timing Summary
Average time for map loading = 547.471 ms
Average time for input loading = 7653.65 ms
Average time for lst::Event creation = 0.00329245 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg      1.3      0.5      0.4      1.8      0.6      0.5      0.9      0.5      1.1      0.0       7.6       5.8+/-  1.1       9.6   explicit_cutvalue[s=1]
   avg      1.3      0.7      0.6      2.1      0.8      0.7      1.1      0.7      1.5      0.0       9.5       7.5+/-  1.6       5.9   explicit_cutvalue[s=2]
   avg      2.2      1.2      1.1      3.1      1.6      1.1      2.2      1.1      2.5      0.0      16.1      12.8+/-  2.8       4.7   explicit_cutvalue[s=4]
   avg      3.5      1.9      1.7      4.1      2.5      1.6      3.2      1.4      3.7      0.0      23.7      18.6+/-  4.7       4.5   explicit_cutvalue[s=6]
   avg      5.5      2.8      2.3      5.1      3.6      2.1      3.9      2.0      4.9      0.0      32.1      24.5+/-  5.7       4.4   explicit_cutvalue[s=8]

github-actions · 2025-08-27T23:23:39Z

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     14.5      0.4      0.3      0.6      0.8      0.3      0.6      0.3      0.9      0.0      18.8       3.9+/-  0.7      18.8   explicit[s=1]
   avg      0.9      0.6      0.4      0.8      1.0      0.3      1.0      0.5      1.3      0.0       6.7       5.5+/-  1.0       3.3   explicit[s=2]
   avg      1.5      0.9      0.8      1.3      1.5      0.5      1.7      0.7      2.0      0.0      10.8       8.9+/-  1.8       2.8   explicit[s=4]
   avg      2.2      1.3      1.1      1.8      2.2      0.7      2.4      1.0      2.8      0.0      15.5      12.6+/-  3.0       2.6   explicit[s=6]
   avg      2.9      1.8      1.6      2.3      2.9      0.8      3.0      1.4      3.5      0.0      20.1      16.4+/-  3.6       2.6   explicit[s=8]
[this PR]
   avg     14.5      0.4      0.3      0.6      0.8      0.3      0.6      0.4      0.9      0.0      18.7       3.9+/-  0.7      18.7   explicit[s=1]
   avg      0.9      0.6      0.4      0.8      1.0      0.3      1.0      0.4      1.3      0.0       6.7       5.5+/-  1.0       6.7   explicit[s=2]
   avg      1.5      0.9      0.7      1.3      1.6      0.4      1.7      0.7      2.0      0.0      10.9       8.9+/-  1.7       2.7   explicit[s=4]
   avg      2.2      1.3      1.1      1.8      2.2      0.6      2.3      1.0      2.8      0.0      15.3      12.5+/-  2.6       5.2   explicit[s=6]
   avg      2.8      1.7      1.6      2.4      2.9      0.8      3.1      1.3      3.5      0.0      20.2      16.5+/-  3.8       5.1   explicit[s=8]

slava77 · 2025-08-27T23:37:26Z

is the GPU more costly now at high stream count or is there some stability issue on the CI side?

(bottom is with this PR)

GNiendorf · 2025-08-27T23:49:21Z

Timing looks good on lnx4555

Previous master (re-ran just now)

github-actions · 2025-08-28T00:43:13Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

ariostas · 2025-08-28T14:12:51Z

I fixed the conflicts with master and squashed. I'll run the CI again and do some timing tests on cgpu-1 to make sure it was a fluke.

/run gpu-all

ariostas · 2025-08-28T15:13:02Z

I'll have to look into this further because it seems to get stuck on event 129 when all outputs are enabled.

github-actions · 2025-08-28T15:32:42Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

github-actions · 2025-08-28T15:48:51Z

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     15.4      0.4      0.4      0.8      1.0      0.3      0.6      0.3      0.9      0.0      20.2       4.5+/-  1.2      20.3   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.4      1.3      0.0       7.4       6.1+/-  1.5       3.7   explicit[s=2]
   avg      1.6      0.9      1.0      1.6      1.8      0.4      1.6      0.7      2.0      0.0      11.7       9.6+/-  2.2       3.0   explicit[s=4]
   avg      2.3      1.3      1.5      2.2      2.6      0.6      2.2      0.9      2.8      0.0      16.4      13.5+/-  2.7       2.8   explicit[s=6]
   avg      3.0      1.8      2.1      3.0      3.3      0.7      2.7      1.3      3.5      0.0      21.5      17.7+/-  3.3       2.7   explicit[s=8]
[this PR]
   avg     15.3      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.1       4.5+/-  1.2      20.1   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.4      1.3      0.0       7.3       6.2+/-  1.6       3.7   explicit[s=2]
   avg      1.5      0.9      1.0      1.6      1.9      0.4      1.7      0.7      2.0      0.0      11.7       9.7+/-  2.3       3.0   explicit[s=4]
   avg      2.2      1.3      1.5      2.2      2.5      0.6      2.3      0.9      2.7      0.0      16.3      13.5+/-  2.8       2.8   explicit[s=6]
   avg      2.9      1.9      2.1      3.0      3.2      0.8      2.7      1.3      3.5      0.0      21.4      17.7+/-  3.3       2.7   explicit[s=8]

ariostas · 2025-08-28T17:15:36Z

I looked into it and it turns out that it's because filling the t3dnn branches hangs for a while for event 129 since there is one module with 2156 T3s. But the slowdown also happens on master, so it's not an issue here.

Here's a timing comparison on cgpu-1 for the first 100 events of PU200.

This PR (all outputs enabled)
Total Timing Summary
Average time for map loading = 599.407 ms
Average time for input loading = 4548.81 ms
Average time for lst::Event creation = 0.000794729 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     25.8      0.5      0.5      1.0      1.2      0.7      0.9      0.4      1.1      0.1      32.2       5.7+/-  1.8    3820.6   explicit_cutvalue[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      1.0      0.5      1.1      0.1       8.2       5.9+/-  2.2    3822.1   explicit_cutvalue[s=2]
   avg      3.5      0.6      0.6      1.1      1.3      0.8      1.4      0.6      1.2      0.1      11.3       6.9+/-  5.1    3842.1   explicit_cutvalue[s=4]
   avg      4.6      0.7      0.7      1.3      1.5      0.7      1.5      0.9      1.3      0.1      13.4       8.0+/-  8.0    3912.7   explicit_cutvalue[s=6]
   avg      5.1      0.9      0.8      1.3      1.6      0.8      2.6      0.9      1.5      0.1      15.5       9.6+/- 12.0    3809.8   explicit_cutvalue[s=8]

master (all outputs enabled)
Total Timing Summary
Average time for map loading = 606.334 ms
Average time for input loading = 4652 ms
Average time for lst::Event creation = 0.00113533 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     28.7      0.5      0.5      1.0      1.2      0.7      0.9      0.4      1.1      0.1      35.1       5.7+/-  1.8    2147.5   explicit_cutvalue[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      1.0      0.5      1.2      0.1       8.4       6.0+/-  2.5    2060.4   explicit_cutvalue[s=2]
   avg      2.3      0.5      0.6      1.1      1.3      0.7      1.2      0.6      1.2      0.1       9.6       6.6+/-  4.9    2005.6   explicit_cutvalue[s=4]
   avg      4.3      0.7      0.7      1.2      1.4      0.8      1.9      0.7      1.3      0.1      13.2       8.0+/-  8.4    2086.6   explicit_cutvalue[s=6]
   avg      5.2      0.8      0.8      1.4      1.5      0.8      2.1      1.0      1.5      0.1      15.2       9.2+/- 11.0    2089.0   explicit_cutvalue[s=8]

This PR (standard outputs enabled)
Total Timing Summary
Average time for map loading = 593.198 ms
Average time for input loading = 4504.92 ms
Average time for lst::Event creation = 0.000749315 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     25.7      0.5      0.5      1.0      1.1      0.7      0.9      0.4      1.1      0.0      31.9       5.6+/-  1.8     180.3   explicit[s=1]
   avg      1.7      0.5      0.6      1.0      1.2      0.7      0.9      0.5      1.1      0.0       8.1       5.8+/-  2.3     163.7   explicit[s=2]
   avg      2.4      0.5      0.6      1.0      1.2      0.7      1.1      0.6      1.2      0.0       9.4       6.4+/-  4.3     165.4   explicit[s=4]
   avg      3.4      0.7      0.7      1.2      1.4      0.8      1.5      0.7      1.3      0.0      11.6       7.4+/-  7.3     165.4   explicit[s=6]
   avg      4.7      0.9      0.7      1.4      1.5      0.8      2.1      0.9      1.5      0.0      14.5       9.0+/- 11.4     165.8   explicit[s=8]

master (standard outputs enabled)
Total Timing Summary
Average time for map loading = 609.423 ms
Average time for input loading = 4481.9 ms
Average time for lst::Event creation = 0.000703903 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     26.1      0.5      0.5      0.9      1.1      0.7      0.9      0.5      1.1      0.0      32.3       5.6+/-  1.8     120.3   explicit[s=1]
   avg      4.7      0.5      0.6      1.0      1.2      0.7      0.9      0.5      1.1      0.0      11.2       5.8+/-  2.2      92.4   explicit[s=2]
   avg      2.7      0.6      0.6      1.0      1.2      0.7      1.1      0.6      1.2      0.0       9.9       6.4+/-  4.3      92.1   explicit[s=4]
   avg      3.4      0.7      0.7      1.2      1.3      0.7      1.4      0.6      1.3      0.0      11.4       7.3+/-  6.8      93.4   explicit[s=6]
   avg      4.8      0.9      0.8      1.3      1.5      0.8      1.5      0.9      1.4      0.0      14.0       8.4+/-  9.4      92.0   explicit[s=8]

There is a noticeable slowdown even with the default outputs. Let me know if you think that more things should be toggled by a flag instead of being written by default.

GNiendorf · 2025-08-28T19:04:18Z

Where is the timing increase coming from? Are we storing more info into the standard ntuple, or is the code just slower? If it is the former, don't we have an -l flag that toggles saving low-level info in the ntuple? We could put the new branches for the low-level objects under that flag so we don't compute/store them by default. edit: Or just don’t have —allobj as the default, save only the final objects.

ariostas · 2025-08-28T21:17:00Z

The output ntuple is still more than 4 times larger, so that must be it. I'll see which branches could be put behind flags so that it doesn't save so much stuff by default.

aashayarora · 2025-08-29T00:52:22Z

Hello, I was running this branch on the PU200RelVal sample on cgpu-1 using the CPU backend and after around 1 hour, the memory usage for the process was around 250GB. I killed it so the system wouldn't run OOM. It seems there might be a memory leak somewhere.

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

GNiendorf · 2025-08-29T02:00:04Z

Added some small comments, not sure if any are related to memory issue above.

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

ariostas · 2025-08-29T13:41:51Z

Thanks for catching those issues/typos, @GNiendorf! I'll go through the changes slowly to make sure there are no more surprises.

@aashayarora were you running with --allobj or just with the standard outputs?

aashayarora · 2025-08-29T14:39:40Z

@aashayarora were you running with --allobj or just with the standard outputs?

I was running with the --md and --ls flags.
The exact command I ran is lst -i PU200RelVal -s 32 -o output_pu200.root --md --ls

ariostas · 2025-08-29T14:47:03Z

Hmm there does seem to be a memory leak. It also seems to already be present in master, but it is worse in this branch, probably because it's saving more stuff. I suspect that the issue is with rooutil.

ariostas · 2025-08-29T15:11:30Z

Actually, seems like the issue is with #168. I'll look into it.

Co-authored-by: Philip Chang <[email protected]> Co-authored-by: Hubert Pugzlys <[email protected]> Co-authored-by: Gavin Niendorf <[email protected]>

ariostas · 2025-08-29T18:55:25Z

Okay, hopefully everything is good now.

I fixed the issues Gavin pointed out, moved some extra branches behind flags, and fixed the memory leak.

Here's the new timing for the standard outputs on cgpu-1. The ntuple is now only about 10% larger.

This PR
Total Timing Summary
Average time for map loading = 582.945 ms
Average time for input loading = 18068.4 ms
Average time for lst::Event creation = 0.000579016 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     16.7      0.4      0.5      1.0      1.1      0.7      0.9      0.5      1.1      0.0      23.0       5.6+/-  1.6      23.0   explicit[s=1]
   avg      2.6      0.9      0.8      1.4      1.4      0.7      1.5      0.8      1.8      0.0      12.0       8.6+/-  2.1       6.0   explicit[s=2]
   avg      3.0      1.7      1.9      2.7      2.5      1.0      2.6      1.1      3.1      0.0      19.6      15.6+/-  2.8       5.0   explicit[s=4]
   avg      4.1      2.4      3.0      3.8      4.1      1.5      4.0      1.7      4.6      0.0      29.2      23.7+/-  5.0       5.0   explicit[s=6]
   avg      5.2      3.5      4.2      5.5      5.5      1.7      5.1      2.4      5.8      0.0      38.9      32.0+/-  6.8       5.0   explicit[s=8]

master
Total Timing Summary
Average time for map loading = 579.196 ms
Average time for input loading = 7751.67 ms
Average time for lst::Event creation = 0.00055631 ms
   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     15.9      0.5      0.5      1.0      1.1      0.7      0.9      0.5      1.1      0.0      22.1       5.6+/-  1.6      22.1   explicit[s=1]
   avg      2.6      0.9      0.9      1.4      1.5      0.7      1.5      0.8      1.7      0.0      12.0       8.7+/-  2.0       6.1   explicit[s=2]
   avg      2.9      1.6      1.9      2.6      2.6      1.0      2.6      1.2      3.1      0.0      19.5      15.6+/-  3.1       5.0   explicit[s=4]
   avg      4.0      2.6      2.9      3.9      4.1      1.4      4.0      1.7      4.6      0.0      29.2      23.8+/-  5.1       5.0   explicit[s=6]
   avg      5.1      3.4      4.1      5.5      5.6      1.7      4.7      2.4      5.8      0.0      38.3      31.6+/-  4.4       4.9   explicit[s=8]

/run gpu-all

github-actions · 2025-08-29T20:32:42Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

github-actions · 2025-08-29T20:49:18Z

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     15.2      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.0       4.5+/-  1.2      20.1   explicit[s=1]
   avg      0.9      0.6      0.6      1.0      1.2      0.3      1.0      0.5      1.2      0.0       7.4       6.2+/-  1.5       3.7   explicit[s=2]
   avg      1.8      0.9      1.0      1.6      1.8      0.4      1.7      0.7      2.0      0.0      11.9       9.7+/-  2.2       3.1   explicit[s=4]
   avg      3.1      1.3      1.5      2.2      2.6      0.6      2.3      1.0      2.7      0.0      17.4      13.6+/-  3.4       3.0   explicit[s=6]
   avg      3.4      1.8      2.1      2.8      3.3      0.8      3.0      1.3      3.4      0.0      22.1      17.9+/-  4.1       5.6   explicit[s=8]
[this PR]
   avg     15.3      0.4      0.4      0.8      1.0      0.3      0.6      0.4      0.9      0.0      20.0       4.5+/-  1.2      20.1   explicit[s=1]
   avg      1.0      0.6      0.6      1.0      1.2      0.3      1.0      0.5      1.3      0.0       7.4       6.1+/-  1.5       3.7   explicit[s=2]
   avg      1.9      0.9      1.0      1.6      1.9      0.4      1.7      0.7      2.0      0.0      12.0       9.7+/-  2.3       3.1   explicit[s=4]
   avg      2.8      1.3      1.6      2.2      2.5      0.6      2.2      1.0      2.8      0.0      17.1      13.6+/-  3.1       2.9   explicit[s=6]
   avg      4.2      1.9      2.2      2.9      3.2      0.7      2.7      1.2      3.5      0.0      22.5      17.5+/-  3.5       2.9   explicit[s=8]

Hoobidoobidoo reviewed Feb 12, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

ariostas force-pushed the ariostas/lstod-rebase branch from 417443d to 43311a5 Compare February 24, 2025 16:57

Hoobidoobidoo reviewed Mar 31, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

ariostas marked this pull request as ready for review April 1, 2025 13:33

ariostas force-pushed the ariostas/lstod-rebase branch from 43311a5 to 72eb0d5 Compare April 1, 2025 13:50

ariostas force-pushed the ariostas/lstod-rebase branch from 72eb0d5 to 8c3792d Compare April 1, 2025 14:41

ariostas commented Apr 1, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/efficiency/src/performance.cc Outdated Show resolved Hide resolved

ariostas force-pushed the ariostas/lstod-rebase branch from 8c3792d to 357dc9f Compare April 1, 2025 17:47

slava77 reviewed Apr 2, 2025

View reviewed changes

ariostas force-pushed the ariostas/lstod-rebase branch from d94865f to fd11107 Compare August 28, 2025 14:11

GNiendorf reviewed Aug 29, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

GNiendorf reviewed Aug 29, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

GNiendorf reviewed Aug 29, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Show resolved Hide resolved

GNiendorf reviewed Aug 29, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

ariostas and others added 2 commits August 29, 2025 18:42

Added code to write LST object data (LSTOD)

6faa946

Co-authored-by: Philip Chang <[email protected]> Co-authored-by: Hubert Pugzlys <[email protected]> Co-authored-by: Gavin Niendorf <[email protected]>

Fixed memory leak

d0b375b

ariostas force-pushed the ariostas/lstod-rebase branch from fd11107 to d0b375b Compare August 29, 2025 18:44

github-actions bot merged commit 27c51f0 into master Sep 16, 2025
3 checks passed

slava77 mentioned this pull request Sep 16, 2025

Creation and implementation of T4 (quadruplet) object #189

Merged

Rebase LSTOD #149

Rebase LSTOD #149

Uh oh!

Conversation

ariostas commented Jan 28, 2025

Uh oh!

slava77 commented Feb 3, 2025

Uh oh!

Uh oh!

Hoobidoobidoo commented Feb 24, 2025

Uh oh!

ariostas commented Feb 24, 2025

Uh oh!

ariostas commented Feb 24, 2025

Uh oh!

ariostas commented Feb 24, 2025

Uh oh!

github-actions bot commented Feb 24, 2025

Uh oh!

slava77 commented Feb 24, 2025

Uh oh!

ariostas commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 24, 2025

Uh oh!

Uh oh!

ariostas commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

ariostas commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

Uh oh!

ariostas commented Apr 1, 2025

Uh oh!

github-actions bot commented Apr 1, 2025

Uh oh!

ariostas commented Apr 1, 2025

Uh oh!

ariostas commented Apr 1, 2025

Uh oh!

slava77 commented Apr 1, 2025

Uh oh!

slava77 commented Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slava77 commented Apr 2, 2025

Uh oh!

ariostas commented Apr 2, 2025

Uh oh!

github-actions bot commented Aug 27, 2025

Uh oh!

slava77 commented Aug 27, 2025

Uh oh!

GNiendorf commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

ariostas commented Aug 28, 2025

Uh oh!

ariostas commented Aug 28, 2025

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

ariostas commented Aug 28, 2025

ariostas commented Feb 24, 2025 •

edited

Loading

GNiendorf commented Aug 27, 2025 •

edited

Loading

GNiendorf commented Aug 28, 2025 •

edited

Loading