Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 103 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
2. [Data production](#dataprod)
1. [Skimming](#skim)
2. [Data sources](#sources)
3. [Job Submission](#job-submission)
3. [Reconstruction Chain](#org0bc224d)
1. [Cluster Size Studies](#orgc33e2a6)
4. [Event Visualization](#org44a4071)
Expand Down Expand Up @@ -99,6 +100,66 @@ This framework relies on photon-, electron- and pion-gun samples produced via CR

The `PU0` files above were merged and are stored under `/data_CMS/cms/alves/L1HGCAL/`, accessible to LLR users and under `/eos/user/b/bfontana/FPGAs/new_algos/`, accessible to all lxplus and LLR users. The latter is used since it is well interfaced with CERN services. The `PU200` files were merged and stored under `/eos/user/i/iehle/data/PU200/<particle>/`.

<a id="job-submission"></a>
## Job Submission
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say explicitly that the Job submission can be use for multiple purposes, and it can be a good alternative to the local Skimming procedure presented in this README, just a few lines above.

About this, I was trying to run produce.py, but I got the following error while trying to read the input file for the skimming, specified in the config.yaml file:

Error in <TFile::TFile>: file /eos/user/b/bfontana/FPGAs/new_algos/photons_0PU_bc_stc_hadd.root does not exist

Everything is good if I try to run it in local. Do you know if we should add some additional configuration options to avoid this kind of access problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes yes I'll mention the skimming procedure application too!

As for running it on EOS, have you confirmed that you have access to /eos/user/b/bfontana/FPGAs/new_algos/photons_0PU_bc_stc_hadd.root just inside of your terminal? I do indeed remember using files on EOS being a major headache, however. I'll look into the problem further, but for now can you see if adding /opt/exp_soft/cms/t3/eos-login -username $USER -init to your script fixes the issue? Obviously this assumes that you have the same LLR/CERN username, which I don't, so if it's not replace $USER directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes from my terminal I have access to this file. I will try to run it adding the script you suggested directly in the script and I will let you know.


Job submission to HT Condor is handled through `bye_splits/production/submit_scripts/job_submit.py` using the section of `config.yaml` for its configuration. The configuration should include usual condor variables, i.e `user`, `proxy`, `queue`, and `local`, as well as a path to the `script` you would like to run on condor. The `arguments` sub-section should contain `key/value` pairs matching the expected arguments that `script` accepts. You can also pass arguments directly in the command line, in which case these values will superseed the defaults set in the configuration file. The new `Arguments` class in `bye_splits/utils/job_helpers.py` verifies that the passed arguments are accepted by `script` and that all required arguments have assigned values. For now, this requires that `script` uses `Arguments` to import its arguments, using a dictionary called `arg_dict`; an example can be found in `tests/submission/dummy_submit.py`. The variable that you would like to iterate over should be set in `iterOver` and its value should correspond to a `key` in the `arguments` sub-section whose value is a list containing the values the script should iterate over. It then contains a section for each particle type which should contain a `submit_dir`, i.e. the directory in which to read and write submission related files, and `args_per_batch` which can be any number between 1 and `len(arguments[<iterOver>])`. An example of the `job` configuration settings is as such:

```yaml
job:
user: iehle
proxy: ~/.t3/proxy.cert
queue: short
local: False
script: /grid_mnt/vol_home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/dummy_submit.py
iterOver: gen_arg
arguments:
float_arg: 0.11
str_arg: a_string
gen_arg: [gen, 3.14, work, broke, 9, False, 12.9, hello]
test:
submit_dir: /home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/
args_per_batch: 2
```

After setting the configuration variables, the jobs are created and launched via

python bye_splits/production/submit_scripts/job_submit.py

while will produce the executable `.sh` file in `<submit_dir>/subs/` that looks like:

#!/usr/bin/env bash
export VO_CMS_SW_DIR=/cvmfs/cms.cern.ch
export SITECONFIG_PATH=$VO_CMS_SW_DIR/SITECONF/T2_FR_GRIF_LLR/GRIF-LLR/
source $VO_CMS_SW_DIR/cmsset_default.sh
list=$1
cleaned_list=$(echo $list | tr -d '[]' | tr ';' '
')
while IFS=";" read -r val; do
python /grid_mnt/vol_home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/dummy_submit.py --gen_arg "$val" --float_arg 0.11 --str_arg a_string
done <<< "$cleaned_list"

and the `.sub` file submitted to HT Condor in `<subdmit_dir>/jobs/` that looks like:

executable = /home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/subs/dummy_submit_exec_v5.sh
Universe = vanilla
Arguments = $(gen_arg) $(float_arg) $(str_arg)
output = /home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/logs/dummy_submit_C$(Cluster)P$(Process).out
error = /home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/logs/dummy_submit_C$(Cluster)P$(Process).err
log = /home/llr/cms/ehle/NewRepos/bye_splits/tests/submission/logs/dummy_submit_C$(Cluster)P$(Process).log
getenv = true
T3Queue = short
WNTag = el7
+SingularityCmd = ""
include: /opt/exp_soft/cms/t3/t3queue |
queue gen_arg, float_arg, str_arg from (
['gen';3.14], 0.11, a_string
['work';'broke'], 0.11, a_string
[9;False], 0.11, a_string
[12.9;'hello'], 0.11, a_string
)

All logs, outputs, and errors are written to their respective files in `<submit_dir>/logs/`. Some primary uses of `job_submit.py` include running the [skimming procedure](#skimming), iterating over each particle type, and running the [cluster studies](#cluster-size-studies) over a list of radii.

<a id="org0bc224d"></a>
# Reconstruction Chain
Expand All @@ -120,33 +181,51 @@ The above will create `html` files with interactive outputs.

## Cluster Size Studies

The script `bye_splits/scripts/cluster_size.py` reads a configuration file `bye_splits/scripts/cl_size_params.yaml` and runs the Reconstruction Chain on the `.root` inside corresponding to the chosen particle, where the clustering step is repeated for a range of cluster radii that is specified in the parameter file under `cl_size: Coeffs`.

The most convenient way of running the study is to do:

bash run_cluster_size.sh <username>

where `<username>` is your lxplus username, creating `.hdf5` files containing Pandas DFs containing cluster properties (notably energy, eta, phi) and associated gen-level particle information for each radius. The bash script acts as a wrapper for the python script, setting a few options that are convenient for the cluster size studies that are not the default options for the general reconstruction chain. As of now, the output `.hdf5` files will be written to your local directory using the structure:

├── /<base_dir>
│ ├── out
│ ├── data
│ │ ├──new_algos
The optimization of the clustering radius is done via the scripts in `bye_splits/scripts/cluster_size/`. The configuration is done in the `config.yaml` file under `clusterStudies`.
The initial steps of the reconstruction chain (fill, smooth, seed) are run via

python run_init_tasks.py --pileup <PU0/PU200>

which will produce the files required for `bye_splits/scripts/cluster_size/condor/run_cluster.py` (default value for `pileup==PU0`). One can run the script on a single radius:

python run_cluster.py --radius <float> --particles <photons/electrons/pions> --pileup <PU0/PU200>

As the directory name suggests, `run_cluster.py` can and should be run as a `script` passed to an HTCondor job as described by [Job Submission](#job-submission) if you wish
to run over all radii. The configuration would look something like this:

```yaml
job:
user: iehle
proxy: ~/.t3/proxy.cert
queue: short
local: False
script: /grid_mnt/vol_home/llr/cms/ehle/NewRepos/bye_splits/bye_splits/scripts/cluster_size/condor/run_cluster.py
iterOver: radius
arguments:
radius: [0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
0.01 , 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018,
0.019, 0.02 , 0.021, 0.022, 0.023, 0.024, 0.025, 0.026, 0.027,
0.028, 0.029, 0.03 , 0.031, 0.032, 0.033, 0.034, 0.035, 0.036,
0.037, 0.038, 0.039, 0.04 , 0.041, 0.042, 0.043, 0.044, 0.045,
0.046, 0.047, 0.048, 0.049, 0.05]
particles: pions
pileup: PU0
photons:
submit_dir: /data_CMS/cms/ehle/L1HGCAL/PU0/photons/
args_per_batch: 10
electrons:
submit_dir: /data_CMS/cms/ehle/L1HGCAL/PU0/electrons/
args_per_batch: 10
pions:
submit_dir: /data_CMS/cms/ehle/L1HGCAL/PU0/pions/
args_per_batch: 10
```

with the files ending up in `new_algos/`. Currently working on implementing an option to send the files directly to your `eos/` directory, assuming the structure:
This will produce the output of `cluster.cluster_default()` for each radius. These files are then combined into one larger `.hdf5` file whose keys correspond to the various radii, and combined and normalized with the gen-level data via:

├── /eos/user/<first_letter>/<username>
│ ├── out
│ ├── data
│ │ ├──PU0
│ │ │ ├──electrons
│ │ │ ├──photons
│ │ │ ├──pions
│ │ ├──PU200
│ │ │ ├──electrons
│ │ │ ├──photons
│ │ │ ├──pions
python run_combine.py

The optional `--file` argument performs the combination and normalization with the gen-level data on only `<file>`.

<a id="org44a4071"></a>

Expand Down
94 changes: 0 additions & 94 deletions bye_splits/production/produce.cc

This file was deleted.

Loading