Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
2. [Data production](#dataprod)
1. [Skimming](#skim)
2. [Data sources](#sources)
3. [Job Submission](#job-submission)
3. [Reconstruction Chain](#org0bc224d)
1. [Cluster Size Studies](#orgc33e2a6)
4. [Event Visualization](#org44a4071)
Expand Down Expand Up @@ -99,6 +100,11 @@ This framework relies on photon-, electron- and pion-gun samples produced via CR

The `PU0` files above were merged and are stored under `/data_CMS/cms/alves/L1HGCAL/`, accessible to LLR users and under `/eos/user/b/bfontana/FPGAs/new_algos/`, accessible to all lxplus and LLR users. The latter is used since it is well interfaced with CERN services. The `PU200` files were merged and stored under `/eos/user/i/iehle/data/PU200/<particle>/`.

<a id="job-submission"></a>
## Job Submission
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say explicitly that the Job submission can be use for multiple purposes, and it can be a good alternative to the local Skimming procedure presented in this README, just a few lines above.

About this, I was trying to run produce.py, but I got the following error while trying to read the input file for the skimming, specified in the config.yaml file:

Error in <TFile::TFile>: file /eos/user/b/bfontana/FPGAs/new_algos/photons_0PU_bc_stc_hadd.root does not exist

Everything is good if I try to run it in local. Do you know if we should add some additional configuration options to avoid this kind of access problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes yes I'll mention the skimming procedure application too!

As for running it on EOS, have you confirmed that you have access to /eos/user/b/bfontana/FPGAs/new_algos/photons_0PU_bc_stc_hadd.root just inside of your terminal? I do indeed remember using files on EOS being a major headache, however. I'll look into the problem further, but for now can you see if adding /opt/exp_soft/cms/t3/eos-login -username $USER -init to your script fixes the issue? Obviously this assumes that you have the same LLR/CERN username, which I don't, so if it's not replace $USER directly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes from my terminal I have access to this file. I will try to run it adding the script you suggested directly in the script and I will let you know.


Job submission to HT Condor is handled through `bye_splits/production/submit_scripts/job_submit.py` using the `job` section of `config.yaml` for its configuration. The configuration should include usual condor variables, i.e `user`, `proxy`, `queue`, and `local`, as well as a path to the `script` you would like to run on condor. The `arguments` sub-section should contain `key/value` pairs matching the expected arguments that `script` accepts. The variable that you would like to iterate over should be set in `iterOver` and its value should correspond to a `key` in the `arguments` sub-section whose value is a list containing the values the script should iterate over. It then contains a section for each particle type which should contain a `submit_dir`, i.e. the directory in which to read and write submission related files, and `args_per_batch` which can be any number between 1 and `len(arguments[<iterOver>])`.


<a id="org0bc224d"></a>
# Reconstruction Chain
Expand Down
94 changes: 0 additions & 94 deletions bye_splits/production/produce.cc

This file was deleted.

31 changes: 31 additions & 0 deletions bye_splits/production/submit_scripts/condor_cluster_size.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/usr/bin/env bash

cd /home/llr/cms/${USER}/NewRepos/bye_splits/bye_splits/scripts/cluster_size/condor/

radius=()
particles=""
pileup=""

while [[ "$#" -gt 0 ]]; do
case "$1" in
--radius)
IFS=";" read -ra radius <<< "${2:1:-1}"
shift 2
;;
--particles)
particles="$2"
shift 2
;;
--pileup)
pileup="$2"
shift 2
;;
*)
echo "Unrecognized argument $1"
exit 1;;
esac
done

for rad in ${radius[@]}; do
python run_cluster.py --radius "$rad" --particles "$particles" --pileup "$pileup"
done
220 changes: 220 additions & 0 deletions bye_splits/production/submit_scripts/job_submit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
#!/usr/bin/env python

import os
import sys

parent_dir = os.path.abspath(__file__ + 5 * "../")
sys.path.insert(0, parent_dir)

from bye_splits.utils import params, common, job_helpers

from datetime import datetime
import subprocess
import yaml

class JobBatches:
"""Class for setting up job batches and setting configuration
variables. The function setup_batches() will take the list in
config[arguments[<iterOver>]] and return a list of lists containing
<args_per_batch> values in each sublist. Example for five total values
with <args_per_batch> = 2:
[0.01, 0.02, 0.03, 0.04, 0.05] --> [[0.01, 0.02], [0.03, 0.04], [0.05]]"""

def __init__(self, particle, config):
self.particle = particle
self.config = config
self.iterOver = config["job"]["iterOver"]
self.particle_var = lambda part, var: config["job"][part][var]

def setup_batches(self):
total = self.config["job"]["arguments"][self.iterOver]

vals_per_batch = self.particle_var(self.particle, "args_per_batch")

batches = [total[i: i + vals_per_batch] for i in range(0, len(total), vals_per_batch)]

return batches

class CondJobBase:
def __init__(self, particle, config):
self.particle = particle
self.script = config["job"]["script"]
self.iterOver = config["job"]["iterOver"]
self.args = config["job"]["arguments"]
self.queue = config["job"]["queue"]
self.proxy = config["job"]["proxy"]
self.local = config["job"]["local"]
self.user = config["job"]["user"]
self.batch = JobBatches(particle, config)
self.particle_dir = self.batch.particle_var(self.particle, "submit_dir")
self.batches = self.batch.setup_batches()

def _write_arg_keys(self, current_version):
"""Writes the line containing the argument
names to the buffer list."""

arg_keys = "Arguments ="
for arg in self.args.keys():
arg_keys += " $({})".format(arg)
arg_keys += "\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make it a bit more flexible. In this way, for instance, I want to run produce.py I will get an error, since the script does not recognise the option --radius which is reported here.
It would be good is only the needed arguments are taken into account.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's the point of the configuration file; it's up to the user to modify config.yaml with the arguments and values that script accepts.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partially agree with both comments: I would add a check that guarantees the argument in the configuration file is supported by the script, raising an exception otherwise. This would make debugging much quicker.

In addition, I noticed the _write_arg_keys function is called only once, and it modifies the list being passed. When you define a function, you ideally want to make it crystal clear what it is supposed to do. In this case, the only way to know is to inspect it directly. This defeats the improved structuring that splitting the code into self-contained functions usually brings, since you force the reader to search for the function, decreasing the code's readability. In my opinion, given that you only use the function once, it would be better to include it directly where it is now being called.

I also suspect if len(self.args) > 0: is not needed, since Arguments = would be correct syntax even without arguments, but please correct me (I haven't tested it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points, thank you. The functionality of _write_arg_keys has been moved out of a dedicated function and into prepare_multi_job_condor directly. I've written a quick argument checker which looks like:

    def _check_script_arguments(self):
        script_args = subprocess.run([self.script, "--args"], capture_output=True, text=True)
        if script_args.returncode != 0:
            result = script_args.stdout.strip().split("\n")
        else:
            result = []
        
        assert(result == list(self.args.keys())), "Provided arguments {} do not match accepted arguments {}.".format(list(self.args.keys()), result)

This requires that any passed script has a named argument --args that would print out the names of the arguments:

        --args)
           echo "radius"
           echo "particles"
           echo "pileup"
           exit 1;;

If you have any ideas on a more flexible way of doing this let me know, or if you think the name should be changed.
I'll additionally write a dummy script that takes no arguments and determine if everything works as expected if we drop the if len(self.args) > 0 check.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the assertion that the arguments in config.yaml match the arguments accepted by <script> is proving quite tricky. If either of you have some ideas on how this can be done generally and robustly please let me know, but if not this might deserve to have a dedicated PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a script runs from the CLI you can access its required arguments via --help, which means the information can be retrieved. I think this is important for this PR, but we can discuss.


current_version.append(arg_keys)

def _write_arg_values(self, current_version):
"""Adds the argument values, where the batch lists are converted
to strings as [val_1, val_2, ...] --> "[val_1;val_2]".
The choice of a semicolon as the delimiter is arbitrary but it
cannot be a comma because this is the delimeter condor itself uses.
Example:
queue radius, particle from (
[0.01, 0.02], photon
)
incorrectly assigns radius="[0.01", particle="0.02]"
queue radius, particle from (
[0.01;0.02], photon
)
correctly assigns radius="[0.01, 0.02]", particle="photon"
"""
arg_keys = ", ".join(self.args.keys())
arg_keys = "queue " + arg_keys + " from (\n"
current_version.append(arg_keys)
for batch in self.batches:
sub_args = list(self.args.keys())[1:]
arg_vals = [self.args[key] for key in sub_args]
all_vals = ["{}".format(batch).replace(", ", ";")]+arg_vals
all_vals = ", ".join(all_vals) + "\n"
current_version.append(all_vals)

current_version.append(")")

def prepare_batch_submission(self):
"""Writes the .sh script that constitutes the executable
in the .sub script. The basename will be the same as the running script, i.e.
the script set in the configuration file. This is then followed by a version number.
Stores the contents in a list that's used as a buffer and checked against the content
in previous versions, only writing the file if an identical file doesn't already exist.
The version number will be incrimented in this case."""

sub_dir = "{}subs/".format(self.particle_dir)

common.create_dir(sub_dir)

script_basename = os.path.basename(self.script).replace(".sh", "").replace(".py", "")

submit_file_name_template = "{}{}_submit.sh".format(sub_dir, script_basename)
submit_file_versions = job_helpers.grab_most_recent(submit_file_name_template, return_all=True)

current_version = []
current_version.append("#!/usr/bin/env bash\n")
current_version.append("workdir={}/bye_splits/production/submit_scripts\n".format(parent_dir))
current_version.append("cd $workdir\n")
current_version.append("export VO_CMS_SW_DIR=/cvmfs/cms.cern.ch\n")
current_version.append("export SITECONFIG_PATH=$VO_CMS_SW_DIR/SITECONF/T2_FR_GRIF_LLR/GRIF-LLR/\n")
current_version.append("source $VO_CMS_SW_DIR/cmsset_default.sh\n")

if len(self.args.keys()) > 0:
args = ["bash {}".format(self.script)]
for i, key in enumerate(self.args.keys()):
args.append("--{} ${}".format(key, i+1))
args = " ".join(args)
current_version.append(args)
else:
current_version.append("bash {}".format(self.script))

# Write the file only if an identical file doesn't already exist
self.sub_file = job_helpers.conditional_write(submit_file_versions, submit_file_name_template, current_version)

def prepare_multi_job_condor(self):
"""Writes the .sub script that is submitted to HT Condor.
Follows the same naming convention and conditional_write()
procedure as the previous function."""

log_dir = "{}logs/".format(self.particle_dir)

script_basename = os.path.basename(self.script).replace(".sh", "").replace(".py", "")

job_file_name_template = "{}jobs/{}.sub".format(self.particle_dir, script_basename)

job_file_versions = job_helpers.grab_most_recent(job_file_name_template, return_all=True)

current_version = []
current_version.append("executable = {}\n".format(self.sub_file))
current_version.append("Universe = vanilla\n")

if len(self.args) > 0:
self._write_arg_keys(current_version)

current_version.append("output = {}{}_C$(Cluster)P$(Process).out\n".format(log_dir, script_basename))
current_version.append("error = {}{}_C$(Cluster)P$(Process).err\n".format(log_dir, script_basename))
current_version.append("log = {}{}_C$(Cluster)P$(Process).log\n".format(log_dir, script_basename))
current_version.append("getenv = true\n")
current_version.append("T3Queue = {}\n".format(self.queue))
current_version.append("WNTag = el7\n")
current_version.append('+SingularityCmd = ""\n')
current_version.append("include: /opt/exp_soft/cms/t3/t3queue |\n")

if len(self.args.keys()) > 0:
self._write_arg_values(current_version)

# Write the file only if an identical file doesn't already exist
self.submission_file = job_helpers.conditional_write(job_file_versions, job_file_name_template, current_version) # Save to launch later

class CondJob:
"""Creates the job directories and files
with prepare_jobs() and runs the jobs with
launch_jobs()."""

def __init__(self, particle, config):
self.base = CondJobBase(particle=particle, config=config)

def prepare_jobs(self):

config_dir = self.base.particle_dir + "configs"
job_dir = self.base.particle_dir + "jobs"
log_dir = self.base.particle_dir + "logs"

for d in (config_dir, job_dir, log_dir):
common.create_dir(d)

self.base.prepare_batch_submission()
self.base.prepare_multi_job_condor()

def launch_jobs(self):

if self.base.local:
machine = "local"
else:
machine = "llrt3.in2p3.fr"

sub_comm = ["condor_submit"]

if not self.base.local:
print(
"\nSending {} jobs on {}".format(self.base.particle, self.base.queue + "@{}".format(machine))
)
print("===============")
print("\n")

sub_args = []

sub_args.append(self.base.submission_file)

if self.base.local:
comm = sub_args
else:
comm = sub_comm + sub_args

print(str(datetime.now()), " ".join(comm))
status = subprocess.run(comm)

if __name__ == "__main__":
with open(params.CfgPath, "r") as afile:
config = yaml.safe_load(afile)

for particle in ("photons", "electrons", "pions"):
job = CondJob(particle, config)
job.prepare_jobs()
job.launch_jobs()
Loading