-
Notifications
You must be signed in to change notification settings - Fork 5
Description
For each of the following muon subsystems, create json files AutoDQM_ML/metadata/histogram_lists/<subsystem>.json and AutoDQM_ML/metadata/datasets/<subsystem>.json where any bad runs are indicated for that subsystem.
An example of the histograms list would be the dt.json file from @chosila , and an example of the datasets json would be (slightly modifying the existing bad_dt.json from Si to reflect recent updates to the DataFetcher):
{
"primary_datasets" : ["SingleMuon"],
"years" : {
"2016" :{
"productions" : ["PromptReco"],
"bad_runs" : ["281680", "281674", "281663", "273294"]
},
"2015" : {
"productions" : ["PromptReco"],
"bad_runs" : ["259464", "258335", "258320", "258313", "258312", "256445"]
}
}
}Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.
For PCAs, this should simply be a matter of running scripts/train.py with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models in json/hdf5 files should be placed in folders on Github, maybe AutoDQM_ML/data/models/<subsystem>/.
❓ Maybe it makes more sense to place these directly in the AutoDQM repo and/or /eos?
Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:
- any relevant details on the histogram list
- any relevant details on the "bad runs" (what histograms are affected, what was the issue, etc.)
- plots of original and reconstructed histograms (with both PCA and AutoEncoder) for both good and bad runs
- SSE summary plot for each histogram (with both PCA and AutoEncoder) split by train/test sets and good/bad runs
- ROC curve and TPR vs. FPR table for both PCA and AutoEncoder.
Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.
Relevant resources (for posterity, please add any additional links you find that may be useful!):
- DT
- RPC
- CSC
- EMTF
- GEM
Checklist: