Skip to content

Conversation

@hajohajo
Copy link
Contributor

@hajohajo hajohajo commented Oct 6, 2020

This PR changes implements TensorFlow based track classifier in RecoTracker/FinalTrackSelectors. It also changes the default behaviour when using Configuration/ProcessModifiers/trackdnn_cff. Since none of the standard workflows (as far as I'm I aware of) use it, so there are no changes expected in the output. Main purpose of this PR is to get the implementation of the code into CMSSW to let other people continue developing and testing the TF based networks.

@mtosi @JanFSchulte

Description
The implementation so that the tensorflow::graphDef shared to multiple modules as ESProduct only one tensorflow::Session per module is initialized in the first call to produce() and then kept as a member variable of the edm::stream, based on the discussion at #28912

The preprocessing of inputs happens inside the neural network using custom layers to perform input scaling and transformations to avoid the need to adjust CMSSW code too often. Only when the input variables (or their order) is changed, should one need to adjust the code at RecoTracker/FinalTrackSelectors/plugins/TrackTfClassifier.cc

The selection thresholds for different track qualities was collected into RecoTracker/IterativeTracking/python/dnnQualityCuts.py so they can be adjusted from one place for all the iterations. The different iterations read their quality cut values for the DNN from the same dictionary stored there.

A set of network weights for running the TrackTfClassifier with can be found from:
cms-data/RecoTracker-FinalTrackSelectors#9

The 'frozen_graph.pb' file needs to be placed to the folder RecoTracker/FinalTrackSelectors/data in order to be found by the relevant code. The training was done using TF2 so a suitably novel CMSSW version is required to run it (11_X and onwards).

The classifier can be turned on by importing the trackdnn modifier in the reconstruction step:

from Configuration.ProcessModifiers.trackdnn_cff import trackdnn

and adding the IOV for the TfGraphRecord in the process:

process.tf_dummy_source = cms.ESSource("EmptyESSource", recordName = cms.string("TfGraphRecord"), firstValid = cms.vuint32(1), iovIsRunNotTime = cms.bool(True) )

PR validation:

The basic tests run with

scram b runtests

and

runTheMatrix.py -l limited -i all --ibeos

without issues. The standard workflow 10824.1 was tested with the above changes made to "step3" performing reconstruction and expected changes were confirmed in the produced tracking validation plots.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2020

The code-checks are being triggered in jenkins.

<use name="FWCore/PluginManager"/>
<use name="PhysicsTools/TensorFlow"/>
<use name="FWCore/Utilities"/>
<use name="RecoTracker/Record"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @hajohajo - these dependencies on RecoTracker and PhysicsTools are not allowed in DataFormats.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should move TrackTfGraphDefProducer, TfGraphDefWrapper to RecoTracker/FinalTrackSelectors.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 6, 2020

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31682/18820

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@slava77
Copy link
Contributor

slava77 commented Oct 6, 2020

The classifier can be turned on by importing the trackdnn modifier in the reconstruction step:

from Configuration.ProcessModifiers.trackdnn_cff import trackdnn

I thought that we had a matrix workflow for this; apparently not.

Code check has found code style and quality issues which could be resolved by applying following patch(s)

these have to be addressed before we can run further tests by the bot

@mtosi
Copy link
Contributor

mtosi commented Oct 6, 2020

once the dedicated workflow which enable the DNN approach is defined, a profile test should be performed
@hajohajo could you please share your results ? thanks !

@jpata
Copy link
Contributor

jpata commented Oct 9, 2020

@hajohajo thanks for this PR, a few suggestions to move forward with the review:

  • please make a PR with the protobuf data file in https://github.com/cms-data/RecoTracker-FinalTrackSelectors, following the patterns in e.g. https://github.com/cms-data/RecoBTag-Combined/pulls
  • link this cms-data PR in the description of this PR, so it can be found later
  • address the code checks by running scram b code-format and scram b code-checks before committing the code
  • DataFormats should be reserved for classes to be stored in the event, please see the suggestions in Track classifier using TensorFlow #31682 (comment)
  • define a workflow that can be used to test this with a runTheMatrix.py call
  • provide a profiling output with the workflow by appending --command=" -n 50 --customise Validation/Performance/TimeMemoryInfo.py --nThreads 1" to the previous call (this and last point as suggested by Mia above)

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2020

The code-checks are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2020

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-31682/18921

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2020

A new Pull Request was created by @hajohajo for master.

It involves the following packages:

RecoTracker/FinalTrackSelectors
RecoTracker/IterativeTracking
TrackingTools/Records

@perrotta, @jpata, @cmsbuild, @slava77 can you please review it and eventually sign? Thanks.
@makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @bellan, @mschrode, @lecriste, @gpetruc, @ebrondol, @mtosi, @dgulhan this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@jpata
Copy link
Contributor

jpata commented Oct 15, 2020

@JanFSchulte @hajohajo please do some more fine-grained checks with igprof if possible, as suggested by Slava: #31682 (comment).

This and addressing the code review comments are currently the open items to address before moving forward. A look by the ML prod group would also be nice-to-have.
Thanks!

class TfGraphDefWrapper {
public:
TfGraphDefWrapper(tensorflow::GraphDef*);
tensorflow::GraphDef* GetGraphDef() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must return

Suggested change
tensorflow::GraphDef* GetGraphDef() const;
tensorflow::GraphDef const* getGraphDef() const;

to ensure the graph is used in const-thread safe manner.


// ------------ method called to produce the data ------------
std::unique_ptr<TfGraphDefWrapper> TfGraphDefProducer::produce(const TfGraphRecord& iRecord) {
return std::unique_ptr<TfGraphDefWrapper>(&wrapper_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to double-delete (once by the destructor of TFGraphDefProducer, and once by the EventSetup system).

I would actually suggest

Suggested change
return std::unique_ptr<TfGraphDefWrapper>(&wrapper_);
return make_unique<TfGraphDefWrapper>(tensorflow::loadGraphDef(...));

This way the graph is loaded only if some other module consumes it instead of at the construction time.

tensorflow::GraphDef* GetGraphDef() const;

private:
tensorflow::GraphDef* graphDef_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
tensorflow::GraphDef* graphDef_;
std::unique_ptr<tensorflow::GraphDef> graphDef_;

?

: tfDnnLabel_(cfg.getParameter<std::string>("tfDnnLabel"))

{}
TfDnnCache* cache_ = new TfDnnCache();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would

Suggested change
TfDnnCache* cache_ = new TfDnnCache();
TfDnnCache cache_;

work?

@@ -0,0 +1,3 @@
#include "PhysicsTools/TensorFlow/interface/TensorFlow.h"
#include "FWCore/Utilities/interface/typelookup.h"
TYPELOOKUP_DATA_REG(tensorflow::Session);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should indeed be unnecessary.


// class declaration

class TfGraphDefProducer : public edm::ESProducer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note that if this class was made an ESSource, an EmptyESSource for the TfGraphRecord would not be needed. (depends mostly if other ESProducers producing products to TfGraphRecord that would not consume the product of TfGraphDefProducer are foreseen)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the ESProducer itself is generic (even if the Session is made part of the ESProduct) I can easily imagine other users in the future, in which case it would be better to stay using EmptyESSource and not make this module ESSource.

void TfGraphDefProducer::fillDescriptions(edm::ConfigurationDescriptions& descriptions) {
edm::ParameterSetDescription desc;
desc.add<std::string>("ComponentName", "tfGraphDef");
desc.add<edm::FileInPath>("FileName", edm::FileInPath());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving out default would make it clear already in the python that a necessary parameter has not been set.

@JanFSchulte
Copy link
Contributor

Thanks to @slava77 's help we know have a number of profiles available.

The profiles have been produced running on 50 events of workflow 10824.1

In this context, BDT is the current track selection used in RECO, DNN is the new selection developed by Joona.

CPU performance:
BDT baseline https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_perf_BDT
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_perf_DNN

Total memory:
BDT https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memtotal_BDT
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memtotal_DNN

Live memory after the 1 st event:
BDT https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_BDT
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_DNN

Live memory after the 25th event:
BDT https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_25thEvent_BDT
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_25thEvent_DNN

Live memory after the 49th event:
BDT https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_49thEvent_BDT
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_49thEvent_DNN

As expected from Joona's previous studies, the memory footprint of this new selection is a bit worrisome. While the evaluation of the DBT does not make it into the top 1000 modules, for the new approach there is still small but certainly larger effect visible:
https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_DNN/899

CPU performance wise, there is no large difference between the two, but it increases slightly with the DNN using Tensor Flow

BDT https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_perf_BDT/983
DNN https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_perf_DNN/910

@mtosi
Copy link
Contributor

mtosi commented Oct 15, 2020

thanks for this further check
may I ask which PU scenario has been used ? do you it would be worth to make a profile vs PU ?

so, the timing seems under control (1 inference pee track), while the memory load is increased significantly

which are the possible approaches ?

thanks !

@JanFSchulte
Copy link
Contributor

This is 0 PU. I can rerun with PU 50 to see if that makes a large difference.

For the memory, while it increased significantly, I would like to understand how much of an issue that actually is. It comes in in 899th place, so I am not sure if this is a "not nice, we would like to reduce it" or a "I can't possibly be deployed in this state" kind of situation.

@slava77
Copy link
Contributor

slava77 commented Oct 15, 2020

As expected from Joona's previous studies, the memory footprint of this new selection is a bit worrisome.

please quantify this in MB.

  • what is the memory allocated once per job
  • what is the memory that comes per instance of iter tracking (TrackProducer?) module
  • what is the memory needed to run the inference per event

the last two will scale with the number of threads.
it sounds like igprof is now capable to record memory from multiple threads; so, a 1-thread and an 8-thread check for comparison/confirmation of expected scaling would be useful

@JanFSchulte
Copy link
Contributor

According to our discussion offline and a multi-threaded test I just finished, the situation is the following:

  • Memory is allocated for a new TensorFlow session at the beginning of each thread/stream. The size of this allocation is 7.4 MiB.
  • An additional 5.6 MiB is used for the evaluation of the TensorFlow network
  • Both numbers scale linearly with the number of threads as confirmed from a test with 8 Threads I just ran

In conclusion, the total memory cost is ~13MiB per thread/stream. I found 104MiB for an the 8 thread test accordingly. As the memory cost for the BDT is too low to show up in the profile, which only shows the top 1000 entries, I don't know how large the increase actually is. But for the single-threaded case the overall memory use of the job increased by only 6 MiB from 2,833,769,503 to 2,839,613,441 (see https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_BDT compared to https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_BDT)

@mtosi I checked PU 50 and found no dependence on PU of these numbers.

@mtosi
Copy link
Contributor

mtosi commented Oct 16, 2020

thanks a lot for these -prompt- further checks !
@slava77 @jpata are these numbers acceptable for the production ? or the developers would need to find a way to reduce the memory load ?

@jpata
Copy link
Contributor

jpata commented Oct 16, 2020

Thank you for these comprehensive checks.

To me, 13MiB/thread (0.45% total?) seems to be in the ballpark of what's to be expected from a DNN with O(100k) float32-s, quantizing float32->int8 could help here.
A linear scaling with threads is not ideal, but I'm not sure TF as it stands today would allow to share e.g. the weights across threads.
Perhaps PPD can answer better if an 8-thread prod job going up by ~100MiB would introduce significantly more failures on the grid?

@VinInn
Copy link
Contributor

VinInn commented Oct 16, 2020

One can always make it a OneProducer. My understanding is that it is not an issue of constant weights alone: it is the network itself that needs memory to compute: experts can confirm.

@slava77
Copy link
Contributor

slava77 commented Oct 16, 2020

As the memory cost for the BDT is too low to show up in the profile, which only shows the top 1000 entries,

https://slava77sk.web.cern.ch/slava77sk/reco/cgi-bin/igprof-navigator/jschulte/PR-31682-wf10824.1/igreport_memlive_1stEvent_BDT/1246
looks like the place for the current BDT costs (consumed from DB).
4.6 +2.5 MiB

I'm still puzzled why the current case scales with nThreads, shouldn't there be just one session per job (covering all classifiers)?

@JanFSchulte
Copy link
Contributor

I am not expert enough on the framework, but maybe @makortel can comment on that as I think the current solution is based on his ideas in #28912

@slava77
Copy link
Contributor

slava77 commented Oct 16, 2020

I found 104MiB for an the 8 thread test accordingly.

is there an igprof file for that?
let me know, I can upload

edm::ESHandle<TfGraphDefWrapper> tfDnnHandle;
es.get<TfGraphRecord>().get(tfDnnLabel_, tfDnnHandle);
tensorflow::GraphDef* graphDef_ = tfDnnHandle.product()->GetGraphDef();
cache_->session_ = tensorflow::createSession(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok, this creates a new session per stream.
This explains the scaling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I could have pointed you to that, sorry. I thought you were asking why it was implemented that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need cache_ to be a part of the global cache, so the base class TrackMVAClassifierBase : public edm::stream::EDProducer<> { needs some modification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought Session is not thread safe, and therefore may not be shared across stream instances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all other TF sessions that we have are in streams; although I'm still not sure if this was done deliberately due to the session being not MT safe.
@Dr15Jones @riga is a session not thread safe?

@makortel
Copy link
Contributor

makortel commented Oct 16, 2020

I'm still puzzled why the current case scales with nThreads, shouldn't there be just one session per job (covering all classifiers)?

My understanding is that there is one Graph per job (in an ESProduct in this case), and then one Session per "inference caller".

@makortel
Copy link
Contributor

In this case the number of Session objects gets also multiplied by the number of iterations where the DNN is used. And by construction the DNN modules are sequential (except jetCore is concurrent to everything else).

One option would be to use edm::ReusableObjectHolder in the ESProduct to create Sessions on demand and share them across the DNN modules. That would limit the number of Session objects to be <= 2 x N(streams).

@cmsbuild cmsbuild mentioned this pull request Oct 20, 2020
@jpata
Copy link
Contributor

jpata commented Oct 22, 2020

pinging @mtosi @JanFSchulte et al: please let us know your plans on how to take this forward. Code review items should be addressed (in a new PR?), while threading/memory efficiency can also be for a follow-up.

@JanFSchulte
Copy link
Contributor

@jpata We are in the process of preparing a new PR that will address the code comments. As this project is taken over by a new PhD student with limited experience it will take a little bit of time, but we plan to have this ready in 1-2 weeks so it will make it in time for pre9.

@jpata
Copy link
Contributor

jpata commented Oct 22, 2020

Thanks for a quick reply! Once that PR exists, we will ask to close this one here.

@jpata
Copy link
Contributor

jpata commented Oct 29, 2020

pinging :)

@jpata
Copy link
Contributor

jpata commented Nov 5, 2020

just another ping @JanFSchulte et al, let us know if there's any issue or progress.

@JanFSchulte
Copy link
Contributor

Hi @jpata Sorry for the late reply. We are aiming for the new PR on Monday or Tuesday.

@jpata
Copy link
Contributor

jpata commented Nov 16, 2020

-1

the continuation PR has been opened #32128, this can be closed by the author or the core team.

@kpedro88
Copy link
Contributor

please close

@cmsbuild cmsbuild closed this Nov 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants