Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for gain selection automation #293

Merged
merged 86 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
c472d4b
write a history file for each subrun and check its output
marialainez Jun 12, 2024
5aca805
move run_sacct_j function to job.py
marialainez Jun 12, 2024
db98603
add memory required by job to the cfg file
marialainez Jun 12, 2024
c7a1c01
improve gain_selection script
marialainez Jun 12, 2024
6b2957d
relaunch the jobs that finished in TIMEOUT
marialainez Jun 12, 2024
6d9350d
remove unnecessary imports
marialainez Jun 12, 2024
2c31aeb
give a warning and exit if run summary is empty
marialainez Jun 13, 2024
b89edcb
solve mistake
marialainez Jun 13, 2024
ec39423
avoid creating a list with the files of all subruns of a run
marialainez Jun 13, 2024
f7625db
make the check looking at the history files (instead of logs)
marialainez Jun 13, 2024
5dbe639
remove unnecessary imports and variables
marialainez Jun 13, 2024
b7c4696
add simulate option
marialainez Jun 14, 2024
f28a9ff
read all the paths from the cfg file
marialainez Jun 17, 2024
52d0f9d
change arg type of simulate option
marialainez Jun 17, 2024
ed3c59f
make the check for failed jobs runwise
marialainez Jun 17, 2024
9289364
create a history file for each successful run (after check)
marialainez Jun 17, 2024
6c637ec
fix mistake
marialainez Jun 17, 2024
c569991
fix mistake
marialainez Jun 17, 2024
3461fe1
solve mistake
marialainez Jun 17, 2024
cf242dd
Update src/osa/scripts/gain_selection.py
marialainez Jun 17, 2024
cf4045b
remove run_sacct_j function + update run_sacct function with an optio…
marialainez Jun 18, 2024
5341a0b
put memory requirement for both tools + solve mistake
marialainez Jun 18, 2024
0ee6594
move job_finished_in_timeout function to job.py
marialainez Jun 18, 2024
9c0bdf7
solve mistake
marialainez Jun 18, 2024
a6080e8
solve mistakes
marialainez Jun 18, 2024
ffca836
check if the files have been already copied (to be able to launch the…
marialainez Jun 20, 2024
687165a
fix issues
marialainez Jun 20, 2024
8530f66
avoid creating directories with --simulate option + add docstrings
marialainez Jun 26, 2024
669d12e
Update src/osa/scripts/gain_selection.py
marialainez Jun 26, 2024
7c36b63
add parameters type
marialainez Jun 26, 2024
c8429fc
consider case where no log files have been created yet
marialainez Jun 28, 2024
2d0e660
add verbose mode
marialainez Jun 28, 2024
9ba1ece
avoid creating history and log files for subruns that do not have 4 s…
marialainez Jul 1, 2024
0856ec4
add run_summary_table function
marialainez Jul 1, 2024
8f765a9
rename function run_already_copied
marialainez Jul 1, 2024
4fb91d0
remove unnecessary elements
marialainez Jul 1, 2024
4dbeac6
use run_summary_table in nightsummary
marialainez Jul 1, 2024
5ed8674
add line to create the directory output_dir
marialainez Jul 4, 2024
d170982
avoid creating the gain selection flag file if any run is not completed
marialainez Jul 4, 2024
299f7e7
use the same date format for all the scripts of lstosa (YYYY-MM-DD)
marialainez Jul 4, 2024
9dce1cb
add necessary import
marialainez Jul 4, 2024
789282a
check if sequencer jobs are running to be able to launch it several t…
marialainez Jul 4, 2024
2252e20
use only the run number to search the job names
marialainez Jul 4, 2024
7c72bdd
fix mistake
marialainez Jul 4, 2024
3b636d7
use the same date format
marialainez Jul 5, 2024
4642c1e
remove unused import
marialainez Jul 5, 2024
b2a6939
avoid creating the flag file if any run is still running
marialainez Jul 8, 2024
b8df966
adapt tests
marialainez Jul 8, 2024
386d10d
change docstring
marialainez Jul 9, 2024
a607bdd
check if sequencer is already running to allow to launch it several t…
marialainez Jul 10, 2024
874a841
adapt gain selection to catch warnings in the logs such as the one in…
marialainez Jul 10, 2024
b4bfe91
define base_dir
marialainez Jul 10, 2024
4f6f20e
exit early if gain sel finished flag exists
morcuended Jul 30, 2024
4100151
Apply suggestions from code review
morcuended Jul 30, 2024
40c0d95
rename warnings variable
morcuended Jul 30, 2024
c49529f
changing some info messages to debug level
morcuended Aug 1, 2024
7b1ae98
adapt tests
marialainez Aug 5, 2024
32b7aca
remove unused function
marialainez Aug 5, 2024
96f0f96
remove unused import
marialainez Aug 5, 2024
cf12d81
correct indent
marialainez Aug 5, 2024
1ea8719
remove unused import
marialainez Aug 5, 2024
513a2bb
allow watching the job status when launching sequencer with -s
marialainez Aug 6, 2024
3840989
stop gain selection check only in case FF heuristic identification is…
marialainez Aug 6, 2024
cfd9b0b
add option in the cfg to decide whether to use or not FF heuristic id…
marialainez Aug 6, 2024
92bab93
avoid creating the flag file if there is no data
marialainez Aug 8, 2024
6fddc64
solve mistake
marialainez Aug 8, 2024
ce62477
check if sequencer already finished before submitting the jobs
marialainez Aug 14, 2024
e9f1029
reduce memory requirement in the example cfg
marialainez Aug 14, 2024
f118abd
adapt tests
marialainez Aug 14, 2024
e38508d
return False if no jobs were launched yet
marialainez Aug 16, 2024
82b72f6
adapt warning message + create flag file even with warnings
marialainez Sep 11, 2024
e32878d
rename runwise .history files as .closed
marialainez Sep 17, 2024
bce3f41
update history files only if .closed file does not exist
marialainez Sep 17, 2024
a08c365
Update src/osa/scripts/sequencer.py
marialainez Sep 19, 2024
e0124b2
write the run number using always 5 digits
marialainez Sep 19, 2024
b364743
add necessary import
marialainez Sep 19, 2024
2328828
check if the R0G files can be opened with ctapipe_io_lst
marialainez Sep 23, 2024
d9b1e72
remove unused variable
marialainez Sep 23, 2024
56a2735
remove unnecessary check
marialainez Sep 30, 2024
7e81b4f
check the last job id (e.g. for timeouts)
marialainez Sep 30, 2024
2753bb9
rename function + rename force argument
marialainez Oct 1, 2024
f2ad14a
Update src/osa/scripts/sequencer.py
marialainez Oct 1, 2024
5ef4b5e
Update src/osa/job.py
marialainez Oct 1, 2024
6e85dbe
Update src/osa/job.py
marialainez Oct 1, 2024
424c073
change function name
marialainez Oct 1, 2024
d32dbba
fix name of force option
marialainez Oct 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/osa/configs/sequencer.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ dl1_to_dl2: lstchain_dl1_to_dl2
dl1a_config: /software/lstchain/data/lstchain_standard_config.json
store_image_dl1ab: True
merge_dl1_datacheck: True
use_ff_heuristic_gain_selection: False
dl1b_config: /software/lstchain/data/lstchain_standard_config.json
dl2_config: /software/lstchain/data/lstchain_standard_config.json
rf_models: /data/models/prod5/zenith_20deg/20201023_v0.6.3
Expand All @@ -71,7 +72,8 @@ electron: /path/to/DL2/electron_mc_testing.h5
PARTITION_PEDCALIB: short, long
PARTITION_DATA: short, long
MEMSIZE_PEDCALIB: 3GB
MEMSIZE_DATA: 16GB
MEMSIZE_DATA: 6GB
MEMSIZE_GAINSEL: 2GB
WALLTIME: 1:15:00
# Days from current day up to which the jobs are fetched from the queue.
# Default is None (left empty).
Expand Down
18 changes: 16 additions & 2 deletions src/osa/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ def get_squeue_output(squeue_output: StringIO) -> pd.DataFrame:
return df


def run_sacct() -> StringIO:
def run_sacct(job_id: str = None) -> StringIO:
"""Run sacct to obtain the job information."""
if shutil.which("sacct") is None:
log.warning("No job info available since sacct command is not available")
Expand All @@ -668,13 +668,18 @@ def run_sacct() -> StringIO:
"-o",
",".join(FORMAT_SLURM),
]

if job_id:
sacct_cmd.append("--jobs")
sacct_cmd.append(job_id)

if cfg.get("SLURM", "STARTTIME_DAYS_SACCT"):
days = int(cfg.get("SLURM", "STARTTIME_DAYS_SACCT"))
start_date = (datetime.date.today() - datetime.timedelta(days=days)).isoformat()
sacct_cmd.extend(["--starttime", start_date])

return StringIO(sp.check_output(sacct_cmd).decode())


def get_sacct_output(sacct_output: StringIO) -> pd.DataFrame:
"""
Expand Down Expand Up @@ -809,3 +814,12 @@ def update_sequence_state(sequence, filtered_job_info: pd.DataFrame) -> None:
sequence.exit = "0:15"
elif any("RUNNING" in job for job in filtered_job_info.State):
sequence.state = "RUNNING"


def job_finished_in_timeout(job_id: str) -> bool:
"""Return True if the input job_id finished in TIMEOUT state."""
job_status = get_sacct_output(run_sacct(job_id=job_id))["State"]
if job_id and job_status.item() == "TIMEOUT":
return True
else:
return False
Loading
Loading