cms-sw · cmsbuild · Nov 8, 2022 · Oct 31, 2022 · Oct 31, 2022 · Feb 19, 2020
diff --git a/Alignment/OfflineValidation/BuildFile.xml b/Alignment/OfflineValidation/BuildFile.xml
@@ -25,6 +25,8 @@
 <use name="CLHEP"/>
 <use name="rootmath"/>
 <use name="roothistmatrix"/>
+<use name="rootgraphics"/>
+<lib name="MultiProc" />
 <export>
   <lib name="1"/>
 </export>
diff --git a/Alignment/OfflineValidation/README.md b/Alignment/OfflineValidation/README.md
@@ -0,0 +1,118 @@
+# Validation
+
+We use the Boost library (Program Options, Filesystem & Property Trees) to deal with the treatment of the config file.
+Basic idea:
+ - a generic config file is "projected" for each validation (*e.g.* the geometry is changed, together with the plotting style);
+ - for each config file, a new condor config file is produced;
+ - a DAGMAN file is also produced in order to submit the whole validation at once.
+
+In principle, the `validateAlignments.py` command is enough to submit everything.
+However, for local testing, one may want to make a dry run: all files will be produced, but the condor jobs will not be submitted;
+then one can just test locally any step, or modify any parameter before simply submitting the DAGMAN.
+
+## HOWTO use
+
+The main script is `validateAlignments.py`. One can check the options with:
+```
+validateAlignments.py -h
+usage: validateAlignments.py [-h] [-d] [-v] [-e] [-f]
+                             [-j {espresso,microcentury,longlunch,workday,tomorrow,testmatch,nextweek}]
+                             config
+
+AllInOneTool for validation of the tracker alignment
+
+positional arguments:
+  config                Global AllInOneTool config (json/yaml format)
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -d, --dry             Set up everything, but don't run anything
+  -v, --verbose         Enable standard output stream
+  -e, --example         Print example of config in JSON format
+  -f, --force           Force creation of enviroment, possible overwritten old configuration
+  -j {espresso,microcentury,longlunch,workday,tomorrow,testmatch,nextweek}, --job-flavour {espresso,microcentury,longlunch,workday,tomorrow,testmatch,nextweek}
+                        Job flavours for HTCondor at CERN, default is 'workday'
+```
+
+As input the AllInOneTool config in `yaml` or `json` file format has to be provided. One working example can be find here: `Alignment/OfflineValidation/test/test.yaml`. To create the set up and submit everything to the HTCondor batch system, one can call
+
+```
+validateAlignments.py $CMSSW_BASE/src/Alignment/OfflineValidation/test/test.yaml 
+
+```
+
+To create the set up without submitting jobs to HTCondor one can use the dry run option:
+
+```
+validateAlignments.py $CMSSW_BASE/src/Alignment/OfflineValidation/test/test.yaml -d
+```
+
+More detailed examples for a possible validation configuration can be found here: `Alignment/OfflineValidation/test/example_DMR_full.yaml`
+
+## HOWTO implement
+
+To implement a new/or porting an existing validation to the new frame work, two things needs to be provided: executables and a python file providing the information for each job.
+
+#### Executables
+
+In the new frame work standalone executables do the job of the validations. They are designed to run indenpendently from the set up of validateAlignments.py, the executables only need a configuration file with information needed for the validation/plotting. One can implement a C++ or a python executable. 
+
+If a C++ executable is implemented, the source file of the executable needs to be placed in the` Alignment/OfflineValidation/bin` directory and the BuildFile.xml in this directory needs to be modified. For the readout of the configuration file, which is in JSON format, the property tree class from the boost library is used. See `bin/DMRmerge.cc as` an example of a proper C++ implementation.
+
+If a python executable is implemented, the source file needs to be placed in the `Alignment/OfflineValidation/scripts` directory. In the first line of the python script a shebang like `#!/usr/bin/env python` must be written and the script itself must be changed to be executable. In the case of python the configuration file can be both in JSON/YAML, because in python both after read in are just python dictionaries. See `Example of Senne when he finished it` as an example of a proper python implementation.
+
+For the special case of a cmsRun job, one needs to provide only the CMS python configuration. Because it is python again, both JSON/YAML for the configuration file are fine to use. Also for this case the execution via cmsRun is independent from the set up provided by validateAligments.py and only need the proper configuration file. See `python/TkAlAllInOneTool/DMR_cfg.py` as an example of a proper implementation.
+
+#### Python file for configuration
+
+For each validation several jobs can be executed, because there are several steps like nTupling, fitting, plotting or there is categorization like alignments, IOVs. The information will be encoded in a global config provided by the aligner, see `Alignment/OfflineValidation/test/test.yaml` as an example. To figure out from the global config which/how many jobs should be prepared, a python file needs to be implemented which reads the global config, extract the relevant information of the global config and yields smaller config designed to be read from the respective executable. As an example see `python/TkAlAllInOneTool/DMR.py`.
+
+There is a logic which needed to be followed. Each job needs to be directionary with a structure like this:
+
+```
+job = {
+       "name": Job name ##Needs to be unique!
+       "dir": workingDirectory  ##Also needs to be unique!
+       "exe": Name of executable/or cmsRun
+       "cms-config": path to CMS config if exe = cmsRun, else leave this out
+       "dependencies": [name of jobs this jobs needs to wait for] ##Empty list [] if no depedencies
+       "config": Slimmed config from global config only with information needed for this job
+}
+```
+
+The python file returns a list of jobs to the `validateAligments.py` which finally creates the directory structure/configuration files/DAG file. To let` validateAligments.py` know one validation implementation exist, import the respective python file and extend the if statements which starts at line 271. This is the only time one needs to touch `validateAligments.py`!
+
+
+## TODO list 
+
+ - improve exceptions handling (filesystem + own)
+ - unification of local configuration style based on DMR/PV example
+ - plotting style options to be implemented
+   - change marker size for trends
+   - accept ROOT pre-defined encoding in config (kRed, kDotted, etc.)
+ - validations to implement:
+   - PV (only average is missing) 
+   - Zµµ (single + merge)
+   - MTS (single + merge)
+   - overlap (single + merge + trend)
+   - ...
+ - documentation (this README)
+   - tutorial for SplitV and GCP 
+   - more working examples
+   - instructions for developers
+ - details
+   - results of PV validation do not end up in results directory but one above
+ - crab submission not available for all validations
+(list from October 2022)
+
+## DMR validation
+For details read `README_DMR.md`
+
+## PV validation
+For details read `README_PV.md`
+
+## JetHT validation
+For details read `README_JetHT.md`
+
+## General info about IOV/run arguments
+For details read `README_IOV.md`
diff --git a/Alignment/OfflineValidation/README_DMR.md b/Alignment/OfflineValidation/README_DMR.md
@@ -0,0 +1,103 @@
+## DMR validation
+
+#General info
+```
+validations:
+    DMR:
+        <step_type>:
+            <step_name>: 
+                <options>
+```
+
+DMR validation runs in consequent steps of 4 possible types:
+ - single (validation analysis by DMR_cfg.py)
+ - merge (DMRmerge macro)
+ - (optional) trends (DMRtrends macro) 
+ - (optional) averaged (mkLumiAveragedPlots.py script)
+Step name is arbitrary string which will be used as a reference for consequent steps.
+Merge job will only start if all corresponding single jobs are done.
+Trends/Averaged job will start if all corresponding merge jobs are done.
+Trends and averaged jobs will run in parallel.
+Averaged job consists of 3 types of sub-jobs (submission is automatized internally). 
+
+#Single DMR jobs:
+Single jobs can be specified per run (IoV as well). In case of MC, IoV is specified to arbitrary 1.  
+
+Variable | Default value | Explanation/Options
+-------- | ------------- | --------------------
+IOV | None | List of IOVs/runs defined by integer value. IOV 1 is reserved for MC.
+Alignments | None | List of alignments. Will create separate directory for each.
+dataset | See defaultInputFiles_cff.py | Path to txt file containing list of datasets to be used. If file is missing at EOS or is corrupted - job will eventually fail (most common issue).
+goodlumi | cms.untracked.VLuminosityBlockRange() | Path to json file containing lumi information about selected IoV - must contain list of runs under particular IoV with lumiblock info. Format: `IOV_Vali_{}.json`
+magneticfield | true | Is magnetic field ON? Not really needed for validation...
+maxevents | 1 | Maximum number of events before cmsRun terminates.
+maxtracks | 1 | Maximum number of tracks per event before next event is processed.
+trackcollection | "generalTracks" | Track collection to be specified here, e.g. "ALCARECOTkAlMuonIsolated" or "ALCARECOTkAlMinBias" ... 
+tthrbuilder | "WithAngleAndTemplate" | Specify TTRH Builder
+usePixelQualityFlag | True | Use pixel quality flag?
+cosmicsZeroTesla | False | Is this validation for cosmics with zero magnetic field?
+vertexcollection | "offlinePrimaryVertices" | Specify vertex collection to be used.
+
+#Merge DMR job
+Its name do not need to match single job name but option `singles` must list all single jobs to be merged.
+Needs to be specified in order to run averaged/trends jobs.
+DMR merged plot style can be adjusted from global plotting style (see `Alignment/OfflineValidation/test/example_DMR_full.yaml`)
+
+Variable | Default value | Explanation/Options
+-------- | ------------- | --------------------
+singles | None | List of strings matching single job names to be merged in one plot.
+methods | ["median","rmsNorm"] | List of types of plots to be produced. Available: median,mean,rms,meanNorm,rmsNorm + X/Y suffix optionally
+curves  | ["plain"] | List of additional plot type otions. Available: plain,split,layers,layersSeparate,layersSplit,layersSplitSeparate
+customrighttitle | "" | Top right title. (To be re-implemented)
+legendheader | "" | Legend title.
+usefit | false | Use gaussian function to fit distribution otherwise extract mean and rms directly from histogram. 
+legendoptions | ["mean","rms"] | Distribution features to be displayed in stat box: mean,meanerror,rms,rmserror,modules,all 
+minimum | 15 | Minimum number of hits requested.
+bigtext | false | Legend text size should be enlarged.
+moduleid | None | Plot residuals for selected list of module IDs. (debugging)
+
+#Trends DMR job
+Its name do not need to match merge neither single job name but option `merges` must list all merge jobs to be put in trend plot.
+Trend plot style is defined globally for all trend plots (see `Alignment/OfflineValidation/test/example_DMR_full.yaml`)
+
+Variable | Default value | Explanation/Options
+-------- | ------------- | --------------------
+merges | None | List of merge job names to be processed for trends. 
+Variables | ["median"] | Trend plot type to be plotted: DrmsNR, median
+firstRun | 272930 | Specify starting run to be plotted.
+lastRun | 325175 | Specify the last run to be considered.
+labels | None | List of string tags to be added in output rootfile.
+year | Run2 | Enforce year tag to be included in lumiInputFile option specified in trend style (This is extra measure)
+
+#Averaged DMR job
+Its name do not need to match merge neither single job name but option `merges` must list all merge jobs to be put in averaged distribution.
+Each merge job to be passed to averager must consist of data OR MC single jobs exclusively (no mix of Data and MC). 
+Some style options are accessible from global style config (see `Alignment/OfflineValidation/test/example_DMR_full.yaml`).
+DISCLAIMER: this tool is not to be used blindly. Averaged distributions will only make sense if the same number of events and tracks is considered for each IOV.
+
+Variable | Default value | Explanation/Options
+-------- | ------------- | --------------------
+merges | None | List of merge job names to be processed for averaged distributions.
+lumiPerRun | None | List of lumi-per-run files. 
+lumiPerIoV | None | List of lumi-per-iov files.
+maxfiles | 700 | Maximum number of files to be merged per sub-job. 
+lumiMC | None | Define scale factors to be used for normalisation of MC from the list of merge jobs. Format: `["(<igroup>::)<merge_name>::<scale_factor>"]`. `<igroup>` is optional integer in case of multiple MC groups to be merged.
+
+Example 1:
+```
+lumiMC:
+    - 1::PaperMC2018ideal::64482.432
+    - 2::PaperMC2018realistic::64482.432
+    - 2::PaperMC2016::20433.379
+```
+Group 2 will merge PaperMC2018realistic simulation object with PaperMC2016 simulation object assuming respective scale factors 64482.432/(64482.432+20433.37) and 20433.37/(64482.432+20433.37).
+Group 1 only consists of one object and will be plotted alongside Group 2 curve.  
+
+Example 2:
+```
+lumiMC:
+    - PaperMC2018ideal::64482.432
+    - PaperMC2018realistic::64482.432
+    - PaperMC2016::20433.379
+```
+Only one group is considered merging 3 simulation objects with corresponding scale factors 64482.432/(64482.432+64482.432+20433.379), ...
diff --git a/Alignment/OfflineValidation/README_IOV.md b/Alignment/OfflineValidation/README_IOV.md
@@ -0,0 +1,76 @@
+Following document summarises usage of IOV/run-driven validations.
+
+## DMR and PV
+
+Example A:
+```
+validations:
+    DMR:
+        single:
+            TestSingleMC:
+                IOV:
+                    - 1
+                dataset: /path/to/dataset.txt
+                ...
+            TestSingleDataIOV:
+                IOV:
+                    - 315257
+                    - 315488
+                    - 315489
+                dataset: /path/to/dataset_IOV_{}.txt
+                goodlumi: /path/to/IOV_Vali_{}.json
+            TestSingleDataRun:
+                IOV:
+                    - 315257
+                    - 315258
+                    - 315259
+                    ...
+                    - 315488
+                dataset: /path/to/dataset_Run_{}.txt
+                goodlumi: /path/to/Run_Vali_{}.json 
+            TestSingleDataFromFile:
+                IOV: /path/to/listOfRunsOrIOVs.txt
+                dataset: /path/to/listOfAllFiles.txt
+```
+TestSingleMC: Run number 1 is reserved for MC objects only.
+TestSingleDataIOV/Run: In case of data, selected numbers can represent both IOV and run numbers. Luminosity is assigned from 'goodlumi' json file where you can define if number should be understood as IOV or single run. Lumiblock structure should also be defined, see Example: `/afs/cern.ch/cms/CAF/CMSALCA/ALCA_TRACKERALIGN/MP/MPproduction/datasetfiles/UltraLegacy/Run2/forDMRweighted/MuonIsolated2018_new/IOV_Vali_320933.json` defines IOV=320933 consisting o 3 runs.
+TestSingleDataFromFile: If list of IOVs/Runs is too long, you can provide it in a form of plain txt list (one number for each line). Dataset file can also contain all input file names (no curly brackets), however it is NOT recommended as it makes jobs much longer.
+
+Example B:
+```
+trends:
+    Run2trend:
+        singles:
+            - Run2018B
+        firstRun: 317087
+        lastRun: 317212
+```
+When defining trend job you can also specify starting and ending run to be plotted. 
+
+```
+style:
+    PV:
+        merge:
+            CMSlabel: Preliminary
+    trends:
+        CMSlabel: Internal
+        Rlabel: 2018B
+        lumiInputFile: /path/to/lumiperIOV.txt
+```
+`lumiInputFile` is used for trend plotting step only and it defines integrated luminosity for each IOV/run considered. It needs to be a plain format with two columns (<run> <lumi>). Following schemes are supported:
+
+```
+RUN 1 <space> lumi for the only run in IOV 1
+...
+RUN 4 <space> lumi for the starting run (4) of IOV 4
+RUN 5 <space> lumi for another run (5) of IOV 4
+RUN 6 <space> lumi for another run (6) of IOV 4
+```
+
+or
+
+```
+IOV 1 <space> lumi for all runs in IOV 1, in this case could be just one run
+...
+IOV 4 <space> lumi for all runs in IOV 4, in this case sum of lumi for RUN 4,5 and 6
+```