Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
8f6c16b
initial attempt
jpata Aug 26, 2025
2ede6c0
training and eval works on steps
jpata Aug 27, 2025
bb17fa6
add some logging
jpata Aug 27, 2025
570b274
fix optimizer loading
jpata Aug 27, 2025
4fe883d
cleanup epoch->step
jpata Aug 27, 2025
a683b03
work on scheduler restore
jpata Aug 27, 2025
81d11f7
log dataloader index, ensure lr_scheduler is correctly restored
jpata Aug 27, 2025
0a237e4
format
jpata Aug 27, 2025
7d2d186
restore dataloader state
jpata Aug 27, 2025
a23bdc1
format
jpata Aug 27, 2025
6646b99
improve logging
jpata Aug 27, 2025
173def3
better logging for eval
jpata Aug 27, 2025
87f0714
ensure model compilation
jpata Aug 27, 2025
9d562cf
batch size for test
jpata Aug 27, 2025
817a96e
added smi
jpata Aug 27, 2025
47c6e80
distributed sampler does not have state dict
jpata Aug 28, 2025
71e0c31
update ray train and test
jpata Aug 28, 2025
a238366
format
jpata Aug 28, 2025
981dc22
attempt to restore dataloader reproducibly
jpata Aug 28, 2025
39aa923
works without shuffle
jpata Aug 28, 2025
5eb838e
fix the loader state dict
jpata Aug 28, 2025
d9df51f
ensure tests pass, format
jpata Aug 29, 2025
fab32c3
disable tqdm in jobs
jpata Aug 29, 2025
a423f3a
enable dataloader fast forwarding
jpata Aug 29, 2025
d526f69
fix tests and ensure fast-forwarding works by resuming the sampler
jpata Aug 29, 2025
3d5cb5c
fix test
jpata Aug 29, 2025
1d6cb9e
added missing test
jpata Aug 29, 2025
3a6a433
fix
jpata Aug 29, 2025
d42c9c6
enable cmdline switch to lamb
jpata Aug 29, 2025
c16b06c
fix override
jpata Aug 29, 2025
cd55915
LUMI config
Aug 31, 2025
f227fda
increase batch size
jpata Aug 31, 2025
f2d54ff
configure optimizer
jpata Aug 31, 2025
5572399
fix tests
jpata Sep 1, 2025
9ded335
merge
jpata Sep 1, 2025
874d43f
added memory logging
jpata Sep 1, 2025
fcc14fc
propagate Pythia
jpata Sep 1, 2025
fd4c3df
format
jpata Sep 1, 2025
ee1b816
change dataset to 2.8.0
jpata Sep 1, 2025
2bc164c
version fallback for pythia vals
jpata Sep 1, 2025
d25109a
merge
Sep 1, 2025
3ce4e4c
fix tests for bs>1
jpata Sep 2, 2025
de032ea
update tuning script
jpata Sep 2, 2025
f411af9
revert datasets for now
jpata Sep 2, 2025
e0cd508
disable pu
Sep 2, 2025
cc192e5
fix
jpata Sep 2, 2025
08a8330
up
jpata Sep 2, 2025
0650999
work on the cms jet notebook
jpata Sep 4, 2025
8a98218
refactoring cms plotting ongoing
jpata Sep 4, 2025
e006d9d
fix sample names
jpata Sep 4, 2025
8b966cb
style has been improved
jpata Sep 4, 2025
8aa2480
improve plot style
jpata Sep 5, 2025
ef2e6a3
added nPV plot
jpata Sep 5, 2025
6e91cf5
update plots
jpata Sep 6, 2025
bfb63d5
Merge branch 'jp_20250826_stepopt' of https://github.com/jpata/partic…
jpata Sep 6, 2025
dfecc45
NANO plotting scripts
jpata Sep 6, 2025
eaa1791
correction plots
jpata Sep 8, 2025
0bf8f6f
consolidate
jpata Sep 8, 2025
b1646a1
update JEC plots
jpata Sep 8, 2025
e462f32
update plots
jpata Sep 8, 2025
f5ebbfc
format
jpata Sep 8, 2025
2116272
update plots
jpata Sep 9, 2025
903887a
remove raw line from reso plots
jpata Sep 9, 2025
52dfbef
refactored interleaved iterator
jpata Sep 9, 2025
d480275
show resolution as ratio
jpata Sep 9, 2025
35b996f
adjust plots
jpata Sep 9, 2025
1a5530e
update lumi training scripts to reduce batch size
Sep 11, 2025
375342d
Merge branch 'jp_20250826_stepopt' of https://github.com/jpata/partic…
jpata Sep 11, 2025
5311ab2
disable some plots and introduce logging
jpata Sep 11, 2025
8c53bac
remove redundant tensorboard logging
jpata Sep 12, 2025
cd00b2d
some cleanup
jpata Sep 12, 2025
f79d071
consolidate logging
jpata Sep 12, 2025
4931158
fix imports
jpata Sep 14, 2025
39fcf73
format
jpata Sep 14, 2025
df9ad09
fix resumable sampler reset
jpata Sep 14, 2025
fa120ec
update logging
Sep 15, 2025
4d6ac85
added tests for resumable sampler non-repeated values
jpata Sep 17, 2025
88f2f49
format
jpata Sep 17, 2025
8305c8f
Merge branch 'jp_20250826_stepopt' of https://github.com/jpata/partic…
Sep 21, 2025
86b556c
enable resetting the validation loader
jpata Sep 21, 2025
4595ee5
improve logging
jpata Sep 21, 2025
0701b3a
merge
Sep 22, 2025
a04400c
up
Sep 22, 2025
a853b95
Merge branch 'jp_20250826_stepopt' of https://github.com/jpata/partic…
Sep 22, 2025
2b9fa7a
up
jpata Sep 23, 2025
bffa48b
increase steps to 1M
jpata Sep 25, 2025
5b7dd51
add additional fiducial cuts
jpata Sep 25, 2025
eb25373
fix persistent workers
jpata Sep 25, 2025
d60079f
format
jpata Sep 25, 2025
b0779b3
run in torch container
jpata Sep 25, 2025
216f48d
get rid of tensorflow
jpata Sep 25, 2025
e762ad8
do not use latest
jpata Sep 25, 2025
3827236
install wget
jpata Sep 25, 2025
107430b
seems like tensorflow is required by tensorflow datasets
jpata Sep 25, 2025
7dc48d9
use devel image
jpata Sep 26, 2025
154af85
default rank
jpata Sep 26, 2025
76912ed
show PF and MLPF at the same time
jpata Sep 26, 2025
ad5d0a6
up
jpata Sep 26, 2025
de1f4da
finalize 13.6 and 14 tev comparison plots
jpata Sep 26, 2025
f71be69
updated plots
jpata Oct 1, 2025
c686590
format
jpata Oct 1, 2025
9357579
update plots
jpata Nov 4, 2025
90ea0bb
generate additional val samples changing ONLY the c.o.m.
jpata Nov 4, 2025
4638502
generate additional val samples changing ONLY the c.o.m.
jpata Nov 4, 2025
3eaa21a
freeze pyg-cms_20251006_094347_769570 training
Nov 4, 2025
f8aa19e
add val2 sample
jpata Nov 25, 2025
28a5aad
Merge branch 'jp_20250826_stepopt' of github.com:jpata/particleflow i…
jpata Nov 25, 2025
4d32fa1
v3 validation samples (Fikri's config)
jpata Nov 25, 2025
bfd8c35
up
jpata Nov 25, 2025
4cf762c
update plots
jpata Nov 26, 2025
265453c
v3 qcd
jpata Dec 22, 2025
55342cf
use torch runtime image
jpata Dec 22, 2025
472aa90
format
jpata Dec 22, 2025
e67d8ce
update docker image
jpata Dec 22, 2025
505702a
install gcc
jpata Dec 22, 2025
0ea8b1e
disable ray
jpata Dec 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 23 additions & 39 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,45 +9,29 @@ on:
workflow_dispatch:

jobs:
remove-unneeded-software:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- run: |
sudo rm -rf \
"$AGENT_TOOLSDIRECTORY" \
/opt/google/chrome \
/opt/microsoft/msedge \
/opt/microsoft/powershell \
/opt/pipx \
/usr/lib/mono \
/usr/local/julia* \
/usr/local/lib/android \
/usr/local/lib/node_modules \
/usr/local/share/chromium \
/usr/local/share/powershell \
/usr/share/dotnet \
/usr/share/swift

deps-torch:
runs-on: ubuntu-22.04
needs: [remove-unneeded-software]
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: "3.10.12"
cache: "pip"
- run: pip install -r requirements.txt
test-in-container:
runs-on: ubuntu-latest
container:
image: pytorch/pytorch:2.9.1-cuda13.0-cudnn9-runtime

torch-pipeline:
runs-on: ubuntu-22.04
needs: [deps-torch]
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
- name: Checkout repository
uses: actions/checkout@v4

- name: Cache pip packages
uses: actions/cache@v4
with:
python-version: "3.10.12"
cache: "pip"
- run: pip install -r requirements.txt
- run: ./scripts/local_test_torch.sh
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-

- name: Install dependencies
run: pip install -r requirements.txt

- name: Install wget
run: |
apt-get update && apt-get install -y wget gcc g++ build-essential

- name: Run tests
run: ./scripts/local_test_torch.sh
79 changes: 79 additions & 0 deletions mlpf/data/cms/genjob_pu55to75_val_v2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/bin/bash
set -e
set -x

OUTDIR=/local/joosep/mlpf/cms/20251001_cmssw_15_0_5_e42b72/pu55to75_val/
CMSSWDIR=/scratch/persistent/joosep/CMSSW_15_0_5/
MLPF_PATH=/home/joosep/particleflow/

#seed must be greater than 0
SAMPLE=$1
SEED=$2

WORKDIR=/scratch/local/joosep/$SLURM_JOBID/$SAMPLE/$SEED
#WORKDIR=`pwd`/$SAMPLE/$SEED
mkdir -p $WORKDIR
mkdir -p $OUTDIR/$SAMPLE/root

PILEUP=Run3_Flat55To75_PoissonOOTPU
PILEUP_INPUT=filelist:${MLPF_PATH}/mlpf/data/cms/pu_files_local.txt

N=50

env
source /cvmfs/cms.cern.ch/cmsset_default.sh

cd $CMSSWDIR
eval `scramv1 runtime -sh`
which python
which python3

env

cd $WORKDIR

#Generate the MC
cmsDriver.py $SAMPLE \
--conditions auto:phase1_2023_realistic \
--beamspot Realistic25ns13p6TeVEarly2023Collision \
-n $N \
--era Run3_2023 \
--eventcontent FEVTDEBUGHLT \
-s GEN,SIM,DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2023 \
--datatier GEN-SIM \
--geometry DB:Extended \
--pileup $PILEUP \
--pileup_input $PILEUP_INPUT \
--no_exec \
--fileout step2_phase1_new.root \
--customise Validation/RecoParticleFlow/customize_pfanalysis.customize_step2 \
--python_filename=step2_phase1_new.py

#Run the reco sequences
cmsDriver.py step3 \
--conditions auto:phase1_2023_realistic \
--beamspot Realistic25ns13p6TeVEarly2023Collision \
--era Run3_2023 \
-n -1 \
--eventcontent FEVTDEBUGHLT \
--runUnscheduled \
-s RAW2DIGI,L1Reco,RECO,RECOSIM \
--datatier GEN-SIM-RECO \
--geometry DB:Extended \
--no_exec \
--filein file:step2_phase1_new.root \
--fileout step3_phase1_new.root \
--customise Validation/RecoParticleFlow/customize_pfanalysis.customize_step3 \
--python_filename=step3_phase1_new.py

pwd
ls -lrt

echo "process.RandomNumberGeneratorService.generator.initialSeed = $SEED" >> step2_phase1_new.py
cmsRun step2_phase1_new.py > /dev/null
cp step2_phase1_new.root $OUTDIR/$SAMPLE/root/step2_${SEED}.root

cmsRun step3_phase1_new.py > /dev/null
cp pfntuple.root $OUTDIR/$SAMPLE/root/pfntuple_${SEED}.root

rm -Rf /scratch/local/joosep/$SLURM_JOBID
81 changes: 81 additions & 0 deletions mlpf/data/cms/genjob_pu55to75_val_v3.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/bin/bash
set -e
set -x

OUTDIR=/local/joosep/mlpf/cms/20251125_cmssw_15_0_5_117d32/pu55to75_val/
CMSSWDIR=/scratch/persistent/joosep/CMSSW_15_0_5/
MLPF_PATH=/home/joosep/particleflow/

#seed must be greater than 0
SAMPLE=$1
SEED=$2

WORKDIR=/scratch/local/joosep/$SLURM_JOBID/$SAMPLE/$SEED
#WORKDIR=`pwd`/$SAMPLE/$SEED
mkdir -p $WORKDIR
mkdir -p $OUTDIR/$SAMPLE/root

PILEUP=Run3_Flat55To75_PoissonOOTPU
PILEUP_INPUT=filelist:${MLPF_PATH}/mlpf/data/cms/pu_files_local_val2.txt

N=50

env
source /cvmfs/cms.cern.ch/cmsset_default.sh

cd $CMSSWDIR
eval `scramv1 runtime -sh`
which python
which python3

env

cd $WORKDIR

#Generate the MC
cmsDriver.py $SAMPLE \
--conditions 140X_mcRun3_2024_realistic_v26 \
--beamspot DBrealistic \
-n $N \
--era Run3_2024 \
--eventcontent FEVTDEBUGHLT \
-s GEN,SIM,DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2023 \
--datatier GEN-SIM \
--geometry DB:Extended \
--pileup $PILEUP \
--pileup_input $PILEUP_INPUT \
--no_exec \
--fileout step2_phase1_new.root \
--customise Validation/RecoParticleFlow/customize_pfanalysis.customize_step2 \
--python_filename=step2_phase1_new.py

# --customise_commands "process.mix.input.nbPileupEvents.probFunctionVariable = cms.vint32(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120) \n process.mix.input.nbPileupEvents.probValue = cms.vdouble(0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446,0.00826446)"

#Run the reco sequences
cmsDriver.py step3 \
--conditions 140X_mcRun3_2024_realistic_v26 \
--beamspot DBrealistic \
--era Run3_2024 \
-n -1 \
--eventcontent FEVTDEBUGHLT \
--runUnscheduled \
-s RAW2DIGI,L1Reco,RECO,RECOSIM \
--datatier GEN-SIM-RECO \
--geometry DB:Extended \
--no_exec \
--filein file:step2_phase1_new.root \
--fileout step3_phase1_new.root \
--customise Validation/RecoParticleFlow/customize_pfanalysis.customize_step3 \
--python_filename=step3_phase1_new.py

pwd
ls -lrt

echo "process.RandomNumberGeneratorService.generator.initialSeed = $SEED" >> step2_phase1_new.py
cmsRun step2_phase1_new.py > /dev/null
cp step2_phase1_new.root $OUTDIR/$SAMPLE/root/step2_${SEED}.root

cmsRun step3_phase1_new.py > /dev/null
cp pfntuple.root $OUTDIR/$SAMPLE/root/pfntuple_${SEED}.root

rm -Rf /scratch/local/joosep/$SLURM_JOBID
15 changes: 8 additions & 7 deletions mlpf/data/cms/prepare_args_val.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,16 @@

import os

outdir = "/local/joosep/mlpf/cms/20250618_cmssw_15_0_5_f8ae2f/"
outdir = "/local/joosep/mlpf/cms/20251125_cmssw_15_0_5_117d32/"

samples = [
("QCDForPF_13p6TeV_TuneCUETP8M1_cfi", 700000, 710050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
("QCDForPF_13p6TeV_TuneCUETP8M1_cfi", 700000, 702050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
("TTbar_13p6TeV_TuneCUETP8M1_cfi", 800000, 802050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
("TTbar_13p6TeV_TuneCUETP8M1_cfi", 800000, 802050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
("PhotonJet_Pt_10_13p6TeV_TuneCUETP8M1_cfi", 900000, 902050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
("PhotonJet_Pt_10_13p6TeV_TuneCUETP8M1_cfi", 900000, 902050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
("QCDForPF_13p6TeV_TuneCUETP8M1_cfi", 700000, 701050, "genjob_pu55to75_val_v3.sh", outdir + "/pu55to75_val"),
# ("QCDForPF_13p6TeV_TuneCUETP8M1_cfi", 700000, 710050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
# ("QCDForPF_13p6TeV_TuneCUETP8M1_cfi", 700000, 702050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
# ("TTbar_13p6TeV_TuneCUETP8M1_cfi", 800000, 802050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
# ("TTbar_13p6TeV_TuneCUETP8M1_cfi", 800000, 802050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
# ("PhotonJet_Pt_10_13p6TeV_TuneCUETP8M1_cfi", 900000, 902050, "genjob_pu55to75_val.sh", outdir + "/pu55to75_val"),
# ("PhotonJet_Pt_10_13p6TeV_TuneCUETP8M1_cfi", 900000, 902050, "genjob_nopu_val.sh", outdir + "/nopu_val"),
]

if __name__ == "__main__":
Expand Down
Loading
Loading