Histogram and run lists for Muon subsystems

For each of the following muon subsystems, create json files `AutoDQM_ML/metadata/histogram_lists/<subsystem>.json` and `AutoDQM_ML/metadata/datasets/<subsystem>.json` where any bad runs are indicated for that subsystem.

An example of the histograms list would be the `dt.json` file from @chosila , and an example of the datasets `json` would be (slightly modifying the existing `bad_dt.json` from Si to reflect recent updates to the `DataFetcher`):
```json
{
    "primary_datasets" : ["SingleMuon"],
    "years" : {
	"2016" :{
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["281680", "281674", "281663", "273294"]
	},
	"2015" : {
	    "productions" : ["PromptReco"],
	    "bad_runs" : ["259464", "258335", "258320", "258313", "258312", "256445"]
	}
    }
}
```

Once histogram and dataset lists are in place, we should proceed with training PCAs and AutoEncoders for each of the histograms To start, the default PCA/AutoEncoder options are probably fine. Later on, we can come back and try optimizing hyperparameters.

For PCAs, this should simply be a matter of running `scripts/train.py` with all of the relevant histograms (1 PCA per histogram). For AutoEncoders, we have the possibility of training a single AutoEncoder on multiple histograms simultaneously. I'd suggest we forego this subtlety for now and just follow the PCA-style of 1 AutoEncoder for 1 histogram. Once PCAs/AutoEncoders are trained, the saved models in `json`/`hdf5` files should be placed in folders on Github, maybe `AutoDQM_ML/data/models/<subsystem>/`.
:question: Maybe it makes more sense to place these directly in the `AutoDQM` repo and/or `/eos`?

Finally, we should perform a validation of both the PCAs and AutoEncoders and summarize these in a set of slides for each subsystem. At minimum, we would want the following:
1. any relevant details on the histogram list
2. any relevant details on the "bad runs" (what histograms are affected, what was the issue, etc.)
3. plots of original and reconstructed histograms (with both PCA and AutoEncoder) for both good and bad runs
4. SSE summary plot for each histogram (with both PCA and AutoEncoder) split by train/test sets and good/bad runs
5. ROC curve and TPR vs. FPR table for both PCA and AutoEncoder.

Through this process, we should stay in contact with relevant DPG experts to make sure they agree with physics side of things for each subsystem. The last step of validation would be running the studies by a DPG expert and asking for their feedback.

Relevant resources (for posterity, please add any additional links you find that may be useful!):
- DT
- RPC
- CSC
   - [CSC 101 Tutorials](https://indico.cern.ch/event/522500/)
   - [Tim Cox Data Monitoring Tutorial](https://indico.cern.ch/event/640826/)
   - [DQM Shifter Instructions](https://twiki.cern.ch/twiki/bin/view/CMS/CSCDPGDataMonitorShiftInstructions) and [additional info](https://indico.cern.ch/event/673046/contributions/2753854/attachments/1540531/2415572/CSC_DQM_Shifts.pdf)
- EMTF
- GEM

Checklist:
- DT
   - [x] Histograms
      - Added by @chosila  in [`dt.json`](https://github.com/AutoDQM/AutoDQM_ML/blob/main/metadata/histogram_lists/dt.json)
   - [ ] Bad runs
   - [ ] PCAs
   - [ ] AutoEncoders
   - [ ] Validation
- RPC
   - [ ] Histograms
   - [ ] Bad runs
   - [ ] PCAs
   - [ ] AutoEncoders
   - [ ] Validation
- CSC
   - [ ] Histograms
   - [ ] Bad runs
   - [ ] PCAs
   - [ ] AutoEncoders
   - [ ] Validation
- EMTF
   - [ ] Histograms
   - [ ] Bad runs
   - [ ] PCAs
   - [ ] AutoEncoders
   - [ ] Validation
- GEM
   - [ ] Histograms
   - [ ] Bad runs
   - [ ] PCAs
   - [ ] AutoEncoders
   - [ ] Validation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Histogram and run lists for Muon subsystems #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Histogram and run lists for Muon subsystems #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions