LST T5-T5 Duplicate Merging #212

GNiendorf · 2025-11-03T23:09:09Z

Introduces a new kernel that merges built T5s on top of T5 and pT5 Track Candidates. Changes the LST TC data structure to accommodate longer tracks.

Zoom in:

Histogram of Lengths:

Purity of extensions looks good.
Plots below are the same, just one has y-scale set to log. Only a small decrease in pMatched of the track candidates.

GNiendorf · 2025-11-04T04:13:23Z

Fixed an error where the additional hits weren't being considered in the plots, should see at least some differences now.

/run all

GNiendorf · 2025-11-04T04:15:15Z

/run standalone

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h

slava77 · 2025-11-04T18:42:21Z

Histogram of track lengths with this current draft PR.

is this ttbar?
How does this look in log scale; I'm interested to see the odd nHits entries (or is this for 100% matched tracks?)

GNiendorf · 2025-11-04T18:45:05Z

Histogram of track lengths with this current draft PR.

is this ttbar? How does this look in log scale; I'm interested to see the odd nHits entries (or is this for 100% matched tracks?)

Sorry, just bad plotting formatting if I'm understanding your confusion. There are only bins at x=0, 6, (pLSs and pT3s), 10 (pT5s, T5s), 12 (extended once), and 14 (extended twice). This doesn't factor in if these hits are real or fake, just raw length of the TC's.

slava77 · 2025-11-04T18:53:37Z

There are only bins at x=0, 6, (pLSs and pT3s), 10 (pT5s, T5s), 12 (extended once), and 14 (extended twice).

is merging allowed on the same doublet module or in the same layer?
if not, I can understand why only even entries will be in place.
Otherwise there should be cases with 2MDs merged having one common hit

slava77 · 2025-11-05T00:19:51Z

/run standalone

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h

slava77 · 2025-11-05T14:11:02Z

/run gpu-standalone

GNiendorf · 2025-11-20T21:09:32Z

Speeding up this kernel has been difficult. Moving the code to the existing duplicate cleaning kernel did not give a lot of benefit. Trying to make the kernel less wasteful to speed it up instead.

GNiendorf · 2025-11-20T21:23:54Z

/run gpu-standalone

GNiendorf · 2025-11-20T22:38:53Z

/run gpu-standalone

GNiendorf · 2025-11-21T00:02:18Z

/run gpu-standalone

GNiendorf · 2025-11-21T00:22:33Z

Timing is finally fixed. Code is janky though, and still only works in standalone.

GNiendorf · 2025-11-21T19:34:11Z

/run gpu-cmssw

github-actions · 2025-11-21T19:49:17Z

There was a problem while building and running with CMSSW on GPU. The logs can be found here.

GNiendorf · 2025-11-21T19:54:35Z

/run gpu-cmssw

GNiendorf · 2025-11-21T20:03:31Z

/run gpu-cmssw

GNiendorf · 2025-11-21T21:39:17Z

Plots look good from what I see.

RecoTracker/LSTCore/src/classes_def.xml

GNiendorf · 2025-12-02T15:34:49Z

Marking this PR as ready for review. Will push some final cleanup soon.

github-actions · 2025-12-10T18:09:49Z

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     35.9      0.4      0.5      0.5      0.9      0.3      0.6      0.6      0.4      1.2      0.0      41.4       5.1+/-  2.6      41.4   explicit[s=1]
   avg      1.4      0.6      0.6      0.8      1.2      0.3      0.9      1.0      0.5      1.6      0.0       8.9       7.2+/-  3.5       4.6   explicit[s=2]
   avg      2.4      1.0      1.1      1.3      1.7      0.4      1.5      1.9      0.9      2.6      0.1      15.0      12.1+/-  5.1       3.8   explicit[s=4]
   avg      1.7      1.6      1.6      2.0      2.3      0.6      2.2      2.8      1.2      3.2      0.2      19.5      17.2+/-  8.2       4.0   explicit[s=6]
   avg      2.2      2.1      1.7      3.2      3.4      0.7      3.4      3.2      2.4      4.7      0.1      27.2      24.2+/- 11.7       4.0   explicit[s=8]
[this PR]
   avg     36.8      0.4      0.4      0.5      0.9      0.3      0.6      0.7      0.4      1.5      0.0      42.5       5.4+/-  2.6      42.5   explicit[s=1]
   avg      0.9      0.6      0.6      0.8      1.1      0.3      0.9      1.0      0.5      2.0      0.0       8.7       7.4+/-  3.2       4.7   explicit[s=2]
   avg      2.5      0.9      1.0      1.3      1.7      0.4      1.4      1.7      1.0      3.2      0.1      15.2      12.3+/-  5.2       3.9   explicit[s=4]
   avg      3.7      1.6      1.7      2.3      2.4      0.6      2.2      2.8      1.4      5.1      0.1      23.8      19.6+/-  9.4       4.0   explicit[s=6]
   avg      4.5      1.9      2.1      3.3      3.1      0.7      3.0      3.2      1.5      5.0      0.4      28.5      23.4+/-  9.0       3.7   explicit[s=8]

github-actions · 2025-12-10T19:15:20Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-12-10T19:28:52Z

@slava77 Let me know if there are any remaining comments. Thanks for the comments so far, I think this PR is a good starting point for me to continue exploring the duplicate merging now (maybe start merging T3's, relax hit sharing requirements, look at adding hits on the same layer etc.).

RecoTracker/LSTCore/interface/Common.h

slava77 · 2025-12-10T23:41:15Z

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h

+      const auto& threadIdx = alpaka::getIdx<alpaka::Block, alpaka::Threads>(acc);
+      const auto& blockDim = alpaka::getWorkDiv<alpaka::Block, alpaka::Threads>(acc);
+
+      // Flatten 2D thread indices within the block (Y, X) into one index
+      const int threadIndexFlat = threadIdx[1u] * blockDim[2u] + threadIdx[2u];
+      const int blockDimFlat = blockDim[1u] * blockDim[2u];
+
+      // Scan over lower modules
+      for (int lowerModuleIndex = lowerModuleBegin + threadIndexFlat; lowerModuleIndex < lowerModuleEnd;


(following on an earlier comment)

Cleaned this code up a bit to be similar to what Yanxi does in the T5 build kernel, but I'm not sure if there's something more clean we can replace this with.

I'm not sure that analogy applies
wouldn't a simple
for (auto lowerModuleIndex : cms::alpakatools::uniform_elements(acc, lowerModuleEnd)) be enough ?

I don't think this code example works because we start at lowerModuleBegin rather than 0, but I think this gets at the question of why I am using Acc3D when I flatten two of those dimensions. It looks like Acc1D works just as well, I'll push that soon.

I can't find any cms::alpakatools wrapper functions that would allow me to do the block-level work I use here even in 1D (each block handles its own TC with shared memory for that TC/block, unlike uniform_elements which I think distributes over multiple blocks?). Let me know if you know of one, otherwise I think Acc1D is as clean as this can go.

since the block direction is aligned with the candidates, perhaps a 2D and use a uniform_elements_x ?
the range can always be over lowerModuleEnd-lowerModuleBegin with an addition.

Note that I expect that having lowerModuleBegin not just 0 the saving is relatively minor and would go away when we start looking for overlap hits on the same layer

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc

slava77 · 2025-12-10T23:56:24Z

The full set of validation and comparison plots can be found here.

this looks problematic.

15-16 is expected, OK
17-18 is probably a 7-layer case for eta between 1.6-1.8 where we can have 5 endcap + 2 barrel layers.
24-26 look like a bug; my guess is there are some cases with initialization to 0 or perhaps some out-of bound reads (I started a CPU test to see if the issue persists there as well)

slava77 · 2025-12-10T23:56:44Z

/run all

github-actions · 2025-12-11T00:21:30Z

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
   avg     30.6    379.7    274.2    117.5     52.4    701.5     11.8    123.2    133.3    190.5      1.5    2016.2    1284.1+/- 305.4     619.3   explicit[s=4] (target branch)
   avg     30.8    376.2    271.9    118.3     52.1    688.5     11.9    127.3    132.9    185.3      1.8    1996.8    1277.5+/- 304.3     618.7   explicit[s=4] (this PR)

github-actions · 2025-12-11T01:55:02Z

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-12-11T04:18:30Z

/run gpu-all

github-actions · 2025-12-11T04:40:32Z

The PR was built and ran successfully in standalone mode on GPU. Here are some of the comparison plots.

The full set of validation and comparison plots can be found here.

Here is a timing comparison:

   Evt    Hits       MD       LS      T3       T5       pLS       pT5      pT3      TC       Reset    Event     Short             Rate
[target branch]
   avg     33.9      0.4      0.4      0.5      0.9      0.3      0.6      0.6      0.4      1.2      0.0      39.2       5.0+/-  2.5      39.2   explicit[s=1]
   avg      1.1      0.5      0.6      0.7      1.1      0.3      0.9      0.9      0.5      1.6      0.0       8.2       6.8+/-  3.0       4.2   explicit[s=2]
   avg      1.9      0.8      0.9      1.2      1.6      0.4      1.4      1.4      0.6      2.4      0.0      12.8      10.5+/-  3.8       3.4   explicit[s=4]
   avg      2.8      1.2      1.3      1.8      2.3      0.6      1.9      1.9      1.0      3.3      0.0      18.2      14.8+/-  4.6       3.2   explicit[s=6]
   avg      3.6      1.7      1.9      2.4      2.9      0.8      2.7      2.8      1.2      4.2      0.0      24.1      19.7+/-  5.3       3.1   explicit[s=8]
[this PR]
   avg     34.6      0.4      0.4      0.5      0.9      0.3      0.6      0.7      0.4      1.4      0.0      40.1       5.3+/-  2.6      40.1   explicit[s=1]
   avg      1.1      0.5      0.5      0.7      1.1      0.3      0.9      0.9      0.5      1.9      0.0       8.5       7.0+/-  2.8       4.3   explicit[s=2]
   avg      1.9      0.8      1.0      1.2      1.6      0.4      1.3      1.4      0.6      2.9      0.0      13.1      10.8+/-  3.8       3.4   explicit[s=4]
   avg      2.7      1.3      1.4      1.8      2.2      0.6      1.9      2.0      0.9      3.8      0.0      18.6      15.4+/-  4.8       3.2   explicit[s=6]
   avg      3.7      1.7      1.9      2.6      2.9      0.7      2.5      2.5      1.3      4.8      0.0      24.7      20.3+/-  4.8       3.2   explicit[s=8]

GNiendorf · 2025-12-11T05:05:38Z

15-16 is expected, OK

17-18 is probably a 7-layer case for eta between 1.6-1.8 where we can have 5 endcap + 2 barrel layers.

24-26 look like a bug; my guess is there are some cases with initialization to 0 or perhaps some out-of bound reads (I started a CPU test to see if the issue persists there as well)

I've seen this bump in the CMSSW plots since the beginning of this PR for both CPU/GPU. I don't see the bump on the standalone plots, so I assumed it had something to do with like the final fit or similar that CMSSW does different from standalone.

Edit: I guess there are 11 OT layers so 22 hits + 3/4 pixel hits causing this? So maybe some bug where all hits get read as non-empty?

github-actions · 2025-12-11T05:42:17Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

slava77 · 2025-12-11T13:59:27Z

Edit: I guess there are 11 OT layers so 22 hits + 3/4 pixel hits causing this? So maybe some bug where all hits get read as non-empty?

that's my guess.

I don't see an explosion in fakes

So, this is relatively rare.
I don't see any change in the CPU variant.
it would help to find these tracks and inspect/visualize in some way: just counting the number of OT hits or layers in the LSTOutputConverter should be a good way to catch the candidate.

In the CMSSW setup we are supposedly using the LST candidates directly to next run a fit on them. So, there is no path to add more hits (there is a way to lose some due to fitting).

GNiendorf · 2025-12-11T16:18:00Z

/run gpu-CMSSW

slava77 · 2025-12-11T16:50:44Z

note to self: updates so far since the last review in one place https://github.com/SegmentLinking/cmssw/compare/910a5688c32655115ee1b24c38bdd67ac8725a29..4cb70dbb59988ef9bc97b7ea6b73fd7692347204

github-actions · 2025-12-11T17:21:29Z

The PR was built and ran successfully with CMSSW on GPU. Here are some plots.

OOTB All Tracks

The full set of validation and comparison plots can be found here.

GNiendorf · 2025-12-11T17:25:27Z

@slava77 Looks good now after adding resets to pLS function + some other cleanup.

slava77 · 2025-12-11T17:39:07Z

Plots look good from what I see.

this kind of disproves my older "intuition" arguments (in a different context) that adding hits to a pT5 should not improve the momentum resolution.
I will still, perhaps more silently, think that it's still the case and maybe speculate that the improvements here are from adding a hit to a pT5 that skipped B1 or perhaps from adding another hit to a T5. Not sure if someone wants to explore which category gains more.

GNiendorf force-pushed the t5_t5_merging branch from 0cf65ec to 1b0fe62 Compare November 4, 2025 02:31

SegmentLinking deleted a comment from github-actions bot Nov 4, 2025

GNiendorf force-pushed the t5_t5_merging branch from eb1c194 to ae1e512 Compare November 4, 2025 04:12

SegmentLinking deleted a comment from github-actions bot Nov 4, 2025

slava77 reviewed Nov 4, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h Outdated Show resolved Hide resolved

slava77 reviewed Nov 5, 2025

View reviewed changes

RecoTracker/LSTCore/src/alpaka/TrackCandidate.h Outdated Show resolved Hide resolved

GNiendorf force-pushed the t5_t5_merging branch from 6e40f49 to 314acc9 Compare November 21, 2025 18:12

GNiendorf force-pushed the t5_t5_merging branch from 0bcdaab to 2cd7d09 Compare November 21, 2025 20:03

GNiendorf force-pushed the t5_t5_merging branch from 2cd7d09 to 03d3e78 Compare December 1, 2025 15:43

GNiendorf commented Dec 1, 2025

View reviewed changes

RecoTracker/LSTCore/src/classes_def.xml Outdated Show resolved Hide resolved

GNiendorf changed the title ~~Work in Progress: LST Duplicate Merging~~ LST T5-T5 Duplicate Merging Dec 2, 2025

GNiendorf marked this pull request as ready for review December 2, 2025 15:34

slava77 reviewed Dec 10, 2025

View reviewed changes

RecoTracker/LSTCore/interface/Common.h Outdated Show resolved Hide resolved

slava77 reviewed Dec 10, 2025

View reviewed changes

RecoTracker/LSTCore/standalone/code/core/write_lst_ntuple.cc Outdated Show resolved Hide resolved

GNiendorf force-pushed the t5_t5_merging branch from 910a568 to 80827e7 Compare December 11, 2025 03:26

GNiendorf force-pushed the t5_t5_merging branch from 322c916 to e2602d4 Compare December 11, 2025 05:25

GNiendorf force-pushed the t5_t5_merging branch 2 times, most recently from 02e949a to d55821d Compare December 11, 2025 15:58

SegmentLinking deleted a comment from github-actions bot Dec 11, 2025

GNiendorf force-pushed the t5_t5_merging branch from d55821d to a32b82c Compare December 11, 2025 16:10

Implement t5-t5 dup-merging with track length plot for standalone

4cb70db

GNiendorf force-pushed the t5_t5_merging branch from a32b82c to 4cb70db Compare December 11, 2025 16:16

slava77 approved these changes Dec 11, 2025

View reviewed changes

github-actions bot merged commit 5e0394e into master Dec 16, 2025
1 check passed

LST T5-T5 Duplicate Merging #212

LST T5-T5 Duplicate Merging #212

Uh oh!

Conversation

GNiendorf commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GNiendorf commented Nov 4, 2025

Uh oh!

GNiendorf commented Nov 4, 2025

Uh oh!

Uh oh!

slava77 commented Nov 4, 2025

Uh oh!

GNiendorf commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slava77 commented Nov 4, 2025

Uh oh!

slava77 commented Nov 5, 2025

Uh oh!

Uh oh!

slava77 commented Nov 5, 2025

Uh oh!

GNiendorf commented Nov 20, 2025

Uh oh!

GNiendorf commented Nov 20, 2025

Uh oh!

GNiendorf commented Nov 20, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

github-actions bot commented Nov 21, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

GNiendorf commented Nov 21, 2025

Uh oh!

Uh oh!

GNiendorf commented Dec 2, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025

Uh oh!

GNiendorf commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

slava77 Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slava77 Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

slava77 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slava77 commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

GNiendorf commented Dec 11, 2025

Uh oh!

GNiendorf commented Nov 3, 2025 •

edited

Loading

GNiendorf commented Nov 4, 2025 •

edited

Loading

GNiendorf commented Dec 10, 2025 •

edited

Loading

GNiendorf Dec 11, 2025 •

edited

Loading

slava77 commented Dec 10, 2025 •

edited

Loading

GNiendorf commented Dec 11, 2025 •

edited

Loading

slava77 commented Dec 11, 2025 •

edited

Loading