Skip to content

Conversation

@GNiendorf
Copy link
Contributor

PR description:

This PR introduces an additional DNN to the LST codebase for better fake rejection of pT3 and pT5 objects. The DNN has a similar architecture to the other DNN's already present in LST: #47618 for T3's and #46857 for T5's. It uses six input features from the existing pT3 and pT5 cuts and applies an additional DNN-based cut to further reduce the LST fake rate. The reduction in fake rate is most pronounced at high pT in the default (pT > 0.8 GeV) configuration, as shown below. The DNN cut has negligible impacts on timing and efficiency.

This PR also adds my training notebook to the codebase, in line with the other DNN notebooks already present.

A detailed summary of the improvements can be found here: PR_162.pdf

Other minor changes:

  1. The rz chi-squared value is now always computed, even for pT3 objects with pT > 5.0 GeV, to avoid overfitting on the previous default value of -1.
  2. Minor naming cleanups in NeuralNetwork.h.

Screenshot 2025-04-30 at 5 02 00 PM

PR validation:

This PR was tested on CPU and GPU in the standalone configuration and runs without issue.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 30, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47995/44663

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @GNiendorf for master.

It involves the following packages:

  • RecoTracker/LSTCore (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dgulhan, @felicepantaleo, @gpetruc, @missirol, @mmusich, @mtosi, @rovere this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@slava77
Copy link
Contributor

slava77 commented Apr 30, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@slava77
Copy link
Contributor

slava77 commented Apr 30, 2025

@cmsbuild please test

@jfernan2
Copy link
Contributor

jfernan2 commented May 1, 2025

assign heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented May 1, 2025

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor

slava77 commented May 1, 2025

@cmsbuild please test

it took 15 hours in the last attempt. I'm guessing something got stuck.

@iarspider
Copy link
Contributor

iarspider commented May 1, 2025

@slava77 tests were not ran overnight (java update, node that cms-bot uses needed to be reconnected by hand), plus rocm nodes were broken until about 5 minutes ago (thanks for @fwyzard for fixing then!)

@fwyzard
Copy link
Contributor

fwyzard commented May 1, 2025

it took 15 hours in the last attempt. I'm guessing something got stuck.

Indeed, there has been both a general issue and a ROCm-specific issue. @iarspider has been following up on them, and both should be resolved now.
Now there should only be some backlog to go though.

@fwyzard
Copy link
Contributor

fwyzard commented May 1, 2025

(also, it's a holiday here today)

@slava77
Copy link
Contributor

slava77 commented May 1, 2025

(also, it's a holiday here today)

sure, although I assume the right to not work does not apply to cms-bot; or does it?

@fwyzard
Copy link
Contributor

fwyzard commented May 1, 2025

well, it does apply to the people the maintain it.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 1, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ba1e61/45805/summary.html
COMMIT: f1fd6d5
CMSSW: CMSSW_15_1_X_2025-05-01-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/47995/45805/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@fwyzard
Copy link
Contributor

fwyzard commented May 7, 2025

+heterogeneous

@slava77
Copy link
Contributor

slava77 commented May 7, 2025

It can be reviewed as soon as @makortel or myself have the time to do it.

Thank you very much for the +1 already.

Unless it is particularly urgent and should skip in front of our "to do" list ? If that is the case, please explain the urgency.

There are a few active PRs in the pipeline (it's a somewhat stable flow); adding more bubbles in the pipeline is somewhat disruptive. So, being able to integrate something straightforward without significant delays would be quite helpful.

We are actively addressing the to-do list (#47793 is a clear and current evidence of that effort), but this is done in balance with the ongoing developments.
It would be nice if we can continue this way.

@slava77
Copy link
Contributor

slava77 commented May 8, 2025

@cmsbuild, please test

I see that almost 20 hours after the start request

cms/47995/el8_amd64_gcc12/relvals/cudaWaiting for status to be reported — Waiting for tests to start

Is there a problem with the CUDA testing infrastructure?

@iarspider
Copy link
Contributor

I have retriggered CUDA relvals, now checking why they didn't get triggered.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 8, 2025

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ba1e61/45896/summary.html
COMMIT: f1fd6d5
CMSSW: CMSSW_15_1_X_2025-05-06-2300/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47995/45896/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

CUDA Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

ROCM Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 1
  • DQMHistoTests: Total histograms compared: 0
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 0
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0 KiB( 0 files compared)
  • Checked 0 log files, 0 edm output root files, 1 DQM output files

@slava77
Copy link
Contributor

slava77 commented May 8, 2025

Comparison Summary

I'm not sure 29834.755 (LST building in HLT) https://tinyurl.com/2bw5h2l7 is expected.
image

The offline variant from 29834.703 looks more consistent with the expectation https://tinyurl.com/24oulncd (is supposedly an equivalent of the plot posted in the PR description)
image

@slava77
Copy link
Contributor

slava77 commented May 8, 2025

@mmusich @VourMa
is HLT:75e33_timing known to be reproducible (in particular in the context of 29834.755 wf)?

@mmusich
Copy link
Contributor

mmusich commented May 8, 2025

is HLT:75e33_timing known to be reproducible

In general, that one is.

(in particular in the context of 29834.755 wf)?

I didn't check the ones gated by the LST modifiers, so I would not bet on that, though I don't have indications of the contrary.

@slava77
Copy link
Contributor

slava77 commented May 8, 2025

I looked at absolute entries in num_reco_pT and num_assoc(recoToSim)_pT, which are supposed to define the inputs to the fake rate , starting from the first bin in the snapshot above

PR					Reference
all		ma		fk	al		ma		fk
2224		2198		26	2204		2169		35
1916		1899		17	1901		1880		21
1421		1413		8	1415		1406		9
920		914		6	916		909		7

.. Ah, it's the "Overlay+ratio" that's messed up. If I select just the overlay, the plots are as expected:
image

@mmusich
Copy link
Contributor

mmusich commented May 9, 2025

+hlt

@cmsbuild
Copy link
Contributor

cmsbuild commented May 9, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @mandrenguyen, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants