-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Minor improvement to TritonService #32861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32861/21088
|
|
A new Pull Request was created by @kpedro88 (Kevin Pedro) for master. It involves the following packages: HeterogeneousCore/SonicTriton @makortel, @cmsbuild, @fwyzard can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
|
please test |
7bf9345 to
7d47581
Compare
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32861/21089
|
|
please test |
|
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b7e177/12815/summary.html Comparison SummarySummary:
|
|
+1 |
|
This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @silviodonato, @dpiparo, @qliphy (and backports should be raised in the release meeting by the corresponding L2) |
|
+1 |
PR description:
While testing an algorithm with a significantly longer inference time than the one used for the SonicTriton unit test, I re-encountered the issue with the fallback server shutting down too early.
Adding some debugging info to
auto_stop, I found that using$PPIDfor the fallback server did not actually get the PID of thecmsRunprocess, but rather theshprocess spawned by thepopencall. Apparently, thisshprocess hangs around long enough for the unit test to complete, ifauto_stopis delayed by a few seconds (in the case of Singularity reading from cvmfs, which is slightly slower than a local read). However, this is not reliable or general.Instead, I now pass the
cmsRunPID directly when starting the fallback server. This works to avoid the previous failure (tested by settingPMAX=1incmsTriton). I've retained the valuePMAX=5in this PR just in case some other instability might arise.PR validation:
Reran stress tests from #32576.
@makortel @silviodonato @qliphy it would be nice to get this into pre3 if the deadline has not passed (it's really just a minor bug fix).