Vectorize data pre-processing when training track embeddings in LST #48335

alexandertuna · 2025-06-16T23:14:34Z

PR description:

This PR numpyfies a few data operations in the notebook which derives track embeddings for improved duplicate removal in LST. The data operations are notably faster with numpy operations than with python for-loops.

Followup to: #48249

PR validation:

I ran the notebook locally and on a large computer, and it is indeed faster. I checked all events by hand to confirm the numpy operations give identical results as the python operations. Chatgpt and gemini also approve of the approach.

cc @GNiendorf @slava77

cmsbuild · 2025-06-16T23:14:59Z

cms-bot internal usage

cmsbuild · 2025-06-16T23:28:31Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48335/45215

cmsbuild · 2025-06-16T23:28:53Z

A new Pull Request was created by @alexandertuna for master.

It involves the following packages:

RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

jfernan2 · 2025-06-17T07:31:45Z

please test

jfernan2 · 2025-06-17T07:31:54Z

+1
Standalone changes

cmsbuild · 2025-06-17T07:32:07Z

This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

cmsbuild · 2025-06-17T10:19:59Z

+1

Size: This PR adds an extra 224KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-95c34d/46782/summary.html
COMMIT: bdf1c8f
CMSSW: CMSSW_15_1_X_2025-06-16-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/48335/46782/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

You potentially removed 1 lines from the logs
Reco comparison results: 6 differences found in the comparisons
DQMHistoTests: Total files compared: 52
DQMHistoTests: Total histograms compared: 4279628
DQMHistoTests: Total failures: 140
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 4279468
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 51 files compared)
Checked 223 log files, 194 edm output root files, 52 DQM output files
TriggerResults: no differences found

mandrenguyen · 2025-06-18T07:17:13Z

+1

alexandertuna added 3 commits June 11, 2025 15:48

Vectorize T5/T5 pair-making

3a18017

Vectorize T5/PLS pair-making

4d88f24

Short comments

bdf1c8f

cmsbuild added this to the CMSSW_15_1_X milestone Jun 16, 2025

cmsbuild added reconstruction-pending pending-signatures tests-pending orp-pending code-checks-pending tracking labels Jun 16, 2025

cmsbuild added code-checks-approved and removed code-checks-pending labels Jun 16, 2025

cmsbuild added reconstruction-approved fully-signed tests-started and removed reconstruction-pending pending-signatures tests-pending labels Jun 17, 2025

cmsbuild added tests-approved and removed tests-started labels Jun 17, 2025

cmsbuild added orp-approved and removed orp-pending labels Jun 18, 2025

cmsbuild merged commit 089d3fd into cms-sw:master Jun 18, 2025
10 checks passed

cmsbuild mentioned this pull request Jun 18, 2025

Fix replacing extensions in WorkFlowRunner.py #48366

Merged

alexandertuna deleted the vectorize_embedding_training branch June 19, 2025 00:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize data pre-processing when training track embeddings in LST #48335

Vectorize data pre-processing when training track embeddings in LST #48335

Uh oh!

alexandertuna commented Jun 16, 2025

Uh oh!

cmsbuild commented Jun 16, 2025 •

edited

Loading

Uh oh!

cmsbuild commented Jun 16, 2025

Uh oh!

cmsbuild commented Jun 16, 2025

Uh oh!

jfernan2 commented Jun 17, 2025

Uh oh!

jfernan2 commented Jun 17, 2025

Uh oh!

cmsbuild commented Jun 17, 2025

Uh oh!

cmsbuild commented Jun 17, 2025

Uh oh!

mandrenguyen commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Vectorize data pre-processing when training track embeddings in LST #48335

Vectorize data pre-processing when training track embeddings in LST #48335

Uh oh!

Conversation

alexandertuna commented Jun 16, 2025

PR description:

PR validation:

Uh oh!

cmsbuild commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmsbuild commented Jun 16, 2025

Uh oh!

cmsbuild commented Jun 16, 2025

Uh oh!

jfernan2 commented Jun 17, 2025

Uh oh!

jfernan2 commented Jun 17, 2025

Uh oh!

cmsbuild commented Jun 17, 2025

Uh oh!

cmsbuild commented Jun 17, 2025

Comparison Summary

Uh oh!

mandrenguyen commented Jun 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cmsbuild commented Jun 16, 2025 •

edited

Loading