Skip to content

Conversation

@GNiendorf
Copy link
Contributor

@GNiendorf GNiendorf commented Jun 4, 2025

This PR introduces a new method of duplicate removal using fully-connected neural networks to compute low-dimensional embeddings of tracks, creating a learned similarity measure for duplicate track rejection. Two small neural networks are trained to map pLS and T5 track features into a shared 6-dimensional embedding space using a contrastive loss function. Duplicate candidates are then identified by placing cuts on the Euclidean distance between tracks in the learned embedding space, replacing the current T5-T5, T5-pT5, and pLS-T5 delta-R based duplicate removal.

T5-T5 and pLS-T5 pairs with small angular separation (delta-R squared < 0.02) are used for DNN training. Cuts on the embedding distance introduced by this PR reduce the LST duplicate rate in the barrel by up to 50% and substantially increase displaced track efficiency. Timing differences from the additional embedding DNNs are negligible, in part because embeddings are computed per-track and this method only requires a simple pairwise Euclidean distance calculation between embedding vectors.

More details can be found here: Embed_T5_PLS.pdf

The DNN training notebook is also added to the standalone codebase in this PR, in-line with previous ML-related PR's for LST: #47618, #46857, and #47995.

Screenshot 2025-06-04 at 2 15 49 PM Screenshot 2025-06-04 at 2 17 39 PM

PR validation:

This PR was tested on CPU and GPU in the standalone configuration and runs without issue.

@slava77

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

A new Pull Request was created by @GNiendorf for master.

It involves the following packages:

  • RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Jun 4, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755,29634.757,29834.757
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@slava77
Copy link
Contributor

slava77 commented Jun 4, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

+1

Size: This PR adds an extra 292KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47e4b3/46548/summary.html
COMMIT: ab51b65
CMSSW: CMSSW_15_1_X_2025-06-04-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48249/46548/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@slava77
Copy link
Contributor

slava77 commented Jun 4, 2025

@GNiendorf noticed that we do not have comparisons for the GPU workflows (.704).
I see that the workflows ran for the PR

@iarspider @smuzaffar
I don't see the baseline for the GPU .704 workflows (actually almost everything from the enable gpu tests is also missing).
These were extras requested in the test parameters.
I seem to recall that it worked in the past.
Are these blacklisted in some way?

@smuzaffar
Copy link
Contributor

please test

@slava77 , thanks for pointing this out. This should be fixed now. For baseline relvals, we first run runTheMatrix.py -n -w gpu --gpu required -l wfs on a non-gpu node. This started to fail[a] when we add --gpu required option. This is fixed now and for runTheMatrix.py -n we do not include--gpu required any more.

20:42:44 + runTheMatrix.py -n -w gpu --gpu required -l 29634.704,29834.704
20:42:44 + grep -v ' workflows '
20:42:44 + grep '^[1-9][0-9]*\(.[0-9][0-9]*\|\)\s'
20:42:44 + sed 's| .*||'
20:42:44 Traceback (most recent call last):
20:42:44   File "/data/cmsbld/jenkins/workspace/ib-run-baseline/CMSSW_15_1_X_2025-06-04-1100/bin/el8_amd64_gcc12/runTheMatrix.py", line 474, in <module>
20:42:44     raise Exception('Launched with --gpu required and no GPU available!')
20:42:44 Exception: Launched with --gpu required and no GPU available!

@jfernan2
Copy link
Contributor

jfernan2 commented Jun 5, 2025

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2025

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2025

+1

Size: This PR adds an extra 292KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-47e4b3/46556/summary.html
COMMIT: ab51b65
CMSSW: CMSSW_15_1_X_2025-06-04-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48249/46556/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 273 differences found in the comparisons
  • DQMHistoTests: Total files compared: 9
  • DQMHistoTests: Total histograms compared: 117529
  • DQMHistoTests: Total failures: 11945
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 105584
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 8 files compared)
  • Checked 32 log files, 36 edm output root files, 9 DQM output files

@fwyzard
Copy link
Contributor

fwyzard commented Jun 5, 2025

These comments are independent from the @cms-sw/heterogeneous-l2 review.

  • Why do you get an increase in efficiency tuning the duplicate removal ?
    Does it mean that the current implementation is killing real tracks ?

  • Why the increase in fake rate image is not a concern ?

const float radius,
const float betaIn) {
const float betaIn,
float (&output)[dnn::t3dnn::kOutputFeatures]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't

Suggested change
float (&output)[dnn::t3dnn::kOutputFeatures]) {
float output[dnn::t3dnn::kOutputFeatures]) {

equivalent ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but you do get a compile-time check from the reference form that the array passed to output is exactly kOutputFeatures elements.

float& dBeta1,
float& dBeta2,
bool& tightCutFlag,
float (&t5Embed)[Params_T5::kEmbed],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
float (&t5Embed)[Params_T5::kEmbed],
float t5Embed[Params_T5::kEmbed],

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.
OK then :-)

@fwyzard
Copy link
Contributor

fwyzard commented Jun 5, 2025

+heterogeneous

Although I would prefer that the arrays are passed without the extra &.

@GNiendorf
Copy link
Contributor Author

GNiendorf commented Jun 5, 2025

These comments are independent from the @cms-sw/heterogeneous-l2 review.

  • Why do you get an increase in efficiency tuning the duplicate removal ?
    Does it mean that the current implementation is killing real tracks ?

Yes, exactly. In the current delta-R based cleaning, two real tracks with small angular separation can be mistakenly treated as duplicates, causing one to be removed and lowering efficiency. If a fake track is near a real one and the fake has a higher score (the sum of a few chi-squared values), the real track can also be incorrectly removed. This can occur if the real track is displaced for example. This PR fixes both cases, which is also why the fake rate increases slightly. Fake-real pairs are always treated as non-duplicates during training, so some fake tracks that were previously being cleaned away by the simple delta-R flag are now no longer marked as duplicates.

  • Why the increase in fake rate is not a concern ?

The increase in fake rate is relatively small, and I think relying on duplicate cleaning to reduce the fake rate by cleaning away fakes close to other fakes or true tracks in the detector is not a great side effect to rely on. This behavior could probably be replicated in the embeddings by lowering the 75% threshold for real hits during training to something like 55%, so that a “fake” track with 60% matched hits to a sim track would be marked as a duplicate of a “real” track with more than 75% matched hits to the same sim track, but again this could lower efficiency if the fake track gets chosen over the real one.

@fwyzard
Copy link
Contributor

fwyzard commented Jun 5, 2025

Mhm... did you also consider implementing a duplicate removal based on shared hits ?

@GNiendorf
Copy link
Contributor Author

GNiendorf commented Jun 5, 2025

Mhm... did you also consider implementing a duplicate removal based on shared hits ?

Yes many of the duplicate cleaning steps check for shared hits already. See below for example

int nMatched = checkHitsT5(ix, jx, quintuplets);
const int minNHitsForDup_T5 = 7;
if (nMatched >= minNHitsForDup_T5) {

@jfernan2
Copy link
Contributor

jfernan2 commented Jun 5, 2025

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 5, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @mandrenguyen, @sextonkennedy, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 5fef658 into cms-sw:master Jun 5, 2025
17 checks passed
@alexandertuna alexandertuna deleted the t5_embed_new branch July 9, 2025 16:17
@alexandertuna alexandertuna restored the t5_embed_new branch July 9, 2025 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants