Track Embeddings for Improved Duplicate Removal in LST #48249

GNiendorf · 2025-06-04T18:26:16Z

This PR introduces a new method of duplicate removal using fully-connected neural networks to compute low-dimensional embeddings of tracks, creating a learned similarity measure for duplicate track rejection. Two small neural networks are trained to map pLS and T5 track features into a shared 6-dimensional embedding space using a contrastive loss function. Duplicate candidates are then identified by placing cuts on the Euclidean distance between tracks in the learned embedding space, replacing the current T5-T5, T5-pT5, and pLS-T5 delta-R based duplicate removal.

T5-T5 and pLS-T5 pairs with small angular separation (delta-R squared < 0.02) are used for DNN training. Cuts on the embedding distance introduced by this PR reduce the LST duplicate rate in the barrel by up to 50% and substantially increase displaced track efficiency. Timing differences from the additional embedding DNNs are negligible, in part because embeddings are computed per-track and this method only requires a simple pairwise Euclidean distance calculation between embedding vectors.

More details can be found here: Embed_T5_PLS.pdf

The DNN training notebook is also added to the standalone codebase in this PR, in-line with previous ML-related PR's for LST: #47618, #46857, and #47995.

PR validation:

This PR was tested on CPU and GPU in the standalone configuration and runs without issue.

@slava77

cmsbuild · 2025-06-04T18:26:38Z

cms-bot internal usage

cmsbuild · 2025-06-04T18:28:19Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48249/45056

cmsbuild · 2025-06-04T18:28:37Z

A new Pull Request was created by @GNiendorf for master.

It involves the following packages:

RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

slava77 · 2025-06-04T18:35:55Z

test parameters:

enable_tests = gpu
workflows_gpu = 29634.704,29834.704
workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
relvals_opt = -w upgrade,standard
relvals_opt_gpu = -w upgrade,standard

slava77 · 2025-06-04T18:36:53Z

@cmsbuild please test

cmsbuild · 2025-06-04T21:11:55Z

+1

Size: This PR adds an extra 292KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47e4b3/46548/summary.html
COMMIT: ab51b65
CMSSW: CMSSW_15_1_X_2025-06-04-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48249/46548/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 238 differences found in the comparisons
DQMHistoTests: Total files compared: 57
DQMHistoTests: Total histograms compared: 4419056
DQMHistoTests: Total failures: 12640
DQMHistoTests: Total nulls: 16
DQMHistoTests: Total successes: 4406380
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 56 files compared)
Checked 240 log files, 206 edm output root files, 57 DQM output files
TriggerResults: no differences found

CUDA Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 1
DQMHistoTests: Total histograms compared: 0
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 0
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
Checked 0 log files, 0 edm output root files, 1 DQM output files

ROCM Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 1
DQMHistoTests: Total histograms compared: 0
DQMHistoTests: Total failures: 0
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 0
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
Checked 0 log files, 0 edm output root files, 1 DQM output files

slava77 · 2025-06-04T22:02:08Z

@GNiendorf noticed that we do not have comparisons for the GPU workflows (.704).
I see that the workflows ran for the PR

@iarspider @smuzaffar
I don't see the baseline for the GPU .704 workflows (actually almost everything from the enable gpu tests is also missing).
These were extras requested in the test parameters.
I seem to recall that it worked in the past.
Are these blacklisted in some way?

smuzaffar · 2025-06-05T07:01:14Z

please test

@slava77 , thanks for pointing this out. This should be fixed now. For baseline relvals, we first run runTheMatrix.py -n -w gpu --gpu required -l wfs on a non-gpu node. This started to fail[a] when we add --gpu required option. This is fixed now and for runTheMatrix.py -n we do not include--gpu required any more.

20:42:44 + runTheMatrix.py -n -w gpu --gpu required -l 29634.704,29834.704
20:42:44 + grep -v ' workflows '
20:42:44 + grep '^[1-9][0-9]*\(.[0-9][0-9]*\|\)\s'
20:42:44 + sed 's| .*||'
20:42:44 Traceback (most recent call last):
20:42:44   File "/data/cmsbld/jenkins/workspace/ib-run-baseline/CMSSW_15_1_X_2025-06-04-1100/bin/el8_amd64_gcc12/runTheMatrix.py", line 474, in <module>
20:42:44     raise Exception('Launched with --gpu required and no GPU available!')
20:42:44 Exception: Launched with --gpu required and no GPU available!

jfernan2 · 2025-06-05T09:49:00Z

assign heterogeneous

cmsbuild · 2025-06-05T09:49:10Z

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2025-06-05T09:52:30Z

+1

Size: This PR adds an extra 292KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47e4b3/46556/summary.html
COMMIT: ab51b65
CMSSW: CMSSW_15_1_X_2025-06-04-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48249/46556/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially added 10 lines to the logs
Reco comparison results: 236 differences found in the comparisons
DQMHistoTests: Total files compared: 57
DQMHistoTests: Total histograms compared: 4419056
DQMHistoTests: Total failures: 12637
DQMHistoTests: Total nulls: 16
DQMHistoTests: Total successes: 4406383
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 56 files compared)
Checked 240 log files, 206 edm output root files, 57 DQM output files
TriggerResults: no differences found

CUDA Comparison Summary

Summary:

You potentially added 1 lines to the logs
Reco comparison results: 277 differences found in the comparisons
DQMHistoTests: Total files compared: 9
DQMHistoTests: Total histograms compared: 117529
DQMHistoTests: Total failures: 11951
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 105578
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
Checked 32 log files, 36 edm output root files, 9 DQM output files
TriggerResults: no differences found

ROCM Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 273 differences found in the comparisons
DQMHistoTests: Total files compared: 9
DQMHistoTests: Total histograms compared: 117529
DQMHistoTests: Total failures: 11945
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 105584
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
Checked 32 log files, 36 edm output root files, 9 DQM output files

fwyzard · 2025-06-05T13:55:51Z

These comments are independent from the @cms-sw/heterogeneous-l2 review.

Why do you get an increase in efficiency tuning the duplicate removal ?
Does it mean that the current implementation is killing real tracks ?
Why the increase in fake rate is not a concern ?

fwyzard · 2025-06-05T14:03:14Z

RecoTracker/LSTCore/src/alpaka/NeuralNetwork.h

                                                     const float radius,
-                                                     const float betaIn) {
+                                                     const float betaIn,
+                                                     float (&output)[dnn::t3dnn::kOutputFeatures]) {


isn't

Suggested change

float (&output)[dnn::t3dnn::kOutputFeatures]) {

float output[dnn::t3dnn::kOutputFeatures]) {

equivalent ?

I see what you mean, but you do get a compile-time check from the reference form that the array passed to output is exactly kOutputFeatures elements.

fwyzard · 2025-06-05T14:13:42Z

RecoTracker/LSTCore/src/alpaka/Quintuplet.h

                                                               float& dBeta1,
                                                               float& dBeta2,
                                                               bool& tightCutFlag,
+                                                               float (&t5Embed)[Params_T5::kEmbed],


Suggested change

float (&t5Embed)[Params_T5::kEmbed],

float t5Embed[Params_T5::kEmbed],

Ah, I see.
OK then :-)

fwyzard · 2025-06-05T14:16:48Z

+heterogeneous

~~Although I would prefer that the arrays are passed without the extra &.~~

GNiendorf · 2025-06-05T14:55:31Z

These comments are independent from the @cms-sw/heterogeneous-l2 review.

Why do you get an increase in efficiency tuning the duplicate removal ?
Does it mean that the current implementation is killing real tracks ?

Yes, exactly. In the current delta-R based cleaning, two real tracks with small angular separation can be mistakenly treated as duplicates, causing one to be removed and lowering efficiency. If a fake track is near a real one and the fake has a higher score (the sum of a few chi-squared values), the real track can also be incorrectly removed. This can occur if the real track is displaced for example. This PR fixes both cases, which is also why the fake rate increases slightly. Fake-real pairs are always treated as non-duplicates during training, so some fake tracks that were previously being cleaned away by the simple delta-R flag are now no longer marked as duplicates.

Why the increase in fake rate is not a concern ?

The increase in fake rate is relatively small, and I think relying on duplicate cleaning to reduce the fake rate by cleaning away fakes close to other fakes or true tracks in the detector is not a great side effect to rely on. This behavior could probably be replicated in the embeddings by lowering the 75% threshold for real hits during training to something like 55%, so that a “fake” track with 60% matched hits to a sim track would be marked as a duplicate of a “real” track with more than 75% matched hits to the same sim track, but again this could lower efficiency if the fake track gets chosen over the real one.

fwyzard · 2025-06-05T15:00:33Z

Mhm... did you also consider implementing a duplicate removal based on shared hits ?

GNiendorf · 2025-06-05T15:05:18Z

Mhm... did you also consider implementing a duplicate removal based on shared hits ?

Yes many of the duplicate cleaning steps check for shared hits already. See below for example

cmssw/RecoTracker/LSTCore/src/alpaka/Kernels.h

Lines 175 to 177 in f79df57

    
           int nMatched = checkHitsT5(ix, jx, quintuplets); 
        
           const int minNHitsForDup_T5 = 7; 
        
           if (nMatched >= minNHitsForDup_T5) {

jfernan2 · 2025-06-05T16:00:32Z

+1

cmsbuild · 2025-06-05T16:00:57Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @mandrenguyen, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

mandrenguyen · 2025-06-05T17:21:53Z

+1

t5, pls track embeddings for better duplicate removal in LST

ab51b65

cmsbuild added this to the CMSSW_15_1_X milestone Jun 4, 2025

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels Jun 4, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jun 4, 2025

cmsbuild added tests-started and removed tests-pending labels Jun 4, 2025

cmsbuild added tests-approved and removed tests-started labels Jun 4, 2025

cmsbuild added tests-started and removed tests-approved labels Jun 5, 2025

cmsbuild added the heterogeneous-pending label Jun 5, 2025

cmsbuild added tests-approved and removed tests-started labels Jun 5, 2025

fwyzard reviewed Jun 5, 2025

View reviewed changes

cmsbuild added heterogeneous-approved and removed heterogeneous-pending labels Jun 5, 2025

cmsbuild added reconstruction-approved fully-signed and removed reconstruction-pending pending-signatures labels Jun 5, 2025

cmsbuild added orp-approved and removed orp-pending labels Jun 5, 2025

cmsbuild merged commit 5fef658 into cms-sw:master Jun 5, 2025
17 checks passed

This was referenced Jun 6, 2025

GCC14: Update to version 14.3.1 cms-sw/cmsdist#9903

Merged

[GCC13] Update version to 13.4.0 cms-sw/cmsdist#9914

Merged

alexandertuna mentioned this pull request Jun 16, 2025

Vectorize data pre-processing when training track embeddings in LST #48335

Merged

alexandertuna deleted the t5_embed_new branch July 9, 2025 16:17

alexandertuna restored the t5_embed_new branch July 9, 2025 16:17

	float (&output)[dnn::t3dnn::kOutputFeatures]) {
	float output[dnn::t3dnn::kOutputFeatures]) {

	float (&t5Embed)[Params_T5::kEmbed],
	float t5Embed[Params_T5::kEmbed],

Track Embeddings for Improved Duplicate Removal in LST #48249

Track Embeddings for Improved Duplicate Removal in LST #48249

Uh oh!

Conversation

GNiendorf commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR validation:

Uh oh!

cmsbuild commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Jun 4, 2025

Uh oh!

cmsbuild commented Jun 4, 2025

Uh oh!

slava77 commented Jun 4, 2025

Uh oh!

slava77 commented Jun 4, 2025

Uh oh!

cmsbuild commented Jun 4, 2025

Comparison Summary

CUDA Comparison Summary

ROCM Comparison Summary

Uh oh!

slava77 commented Jun 4, 2025

Uh oh!

smuzaffar commented Jun 5, 2025

Uh oh!

jfernan2 commented Jun 5, 2025

Uh oh!

cmsbuild commented Jun 5, 2025

Uh oh!

cmsbuild commented Jun 5, 2025

Comparison Summary

CUDA Comparison Summary

ROCM Comparison Summary

Uh oh!

fwyzard commented Jun 5, 2025

Uh oh!

fwyzard Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

GNiendorf Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fwyzard Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fwyzard Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

fwyzard commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GNiendorf commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fwyzard commented Jun 5, 2025

Uh oh!

GNiendorf commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfernan2 commented Jun 5, 2025

Uh oh!

cmsbuild commented Jun 5, 2025

Uh oh!

mandrenguyen commented Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

GNiendorf commented Jun 4, 2025 •

edited

Loading

cmsbuild commented Jun 4, 2025 •

edited

Loading

fwyzard commented Jun 5, 2025 •

edited

Loading

GNiendorf commented Jun 5, 2025 •

edited

Loading

GNiendorf commented Jun 5, 2025 •

edited

Loading