-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Vectorize data pre-processing when training track embeddings in LST #48335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize data pre-processing when training track embeddings in LST #48335
Conversation
|
cms-bot internal usage |
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-48335/45215 |
|
A new Pull Request was created by @alexandertuna for master. It involves the following packages:
@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
please test |
|
+1 |
|
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 Size: This PR adds an extra 224KB to repository Comparison SummarySummary:
|
|
+1 |
PR description:
This PR numpyfies a few data operations in the notebook which derives track embeddings for improved duplicate removal in LST. The data operations are notably faster with numpy operations than with python for-loops.
Followup to: #48249
PR validation:
I ran the notebook locally and on a large computer, and it is indeed faster. I checked all events by hand to confirm the numpy operations give identical results as the python operations. Chatgpt and gemini also approve of the approach.
cc @GNiendorf @slava77