Skip to content

Commit 7ab29f3

Browse files
JulesVandenbroeckrigamafrahmallcontributors[bot]jomatthi
authored
Get upstream changes (#114)
* Extend dy weight application to use btag multiplicity. (columnflow#739) * Extend dy weight application to use btag multiplicity. * Update docstring. * Hotfix nbtags variable in dy weight producer. * fix skipping data in CreateDatacards * Add objects for interacting with CMS CAT meta data. (columnflow#740) * Add objects for interacting with CAT meta data. * Remove namespace for now. * Cleanup. * Update fixed law. * Use cf.cms task namespace. * Add CMSDatasetInfo. * Allow pathlib input. * Add dc pog to CATSnapshot. * More flexible POG overrides. * Typo. * Simplify. * Hotfix CAT metadata update check for missing POG dirs. * add subplots_cfg in plot_all (columnflow#742) Co-authored-by: Mathis Frahm <[email protected]> * Update law. * Refactor generator-level top and top decay product lookup (columnflow#741) * Refactor gen top lookup. * Add theory-based top pt weight method. * Comments. * Comments. * Rename field wDecay -> wChildren. * Update kept fields in gen_particles.py Removed 'status' and 'statusFlags' from kept generator particle fields. * Fix gen part field transformations. * Add suggestion by @jolange * Add gen_higgs_lookup. * Hotfix saving of columns in gen_particle lookups. * Hotfix depth limit of gen particles. * Add gen_dy_lookup. * Hotfix multi-config lookup via patterns. * Hotfix reduction to skip empty chunks. * Hotfix higgs gen lookup, considering effective gluon/photon decays. * Hotfix single shift selection in plotting. * Allow patterns in get_shifts_from_sources. * Hotfix save_div in plot scale factor. * [cms] Update log in CheckCATUpdates task. * Skip string columns in finiteness checks, fixes columnflow#743. * Hotfix repo bunlding, add missing user config. * [cms] Refactor egamma calibrators. (columnflow#745) * docs: add Bogdan-Wiederspan as a contributor for review (columnflow#746) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * docs: add aalvesan as a contributor for review (columnflow#747) * docs: update README.md [skip ci] * docs: update .all-contributorsrc [skip ci] --------- Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> * Add t->w->tau children in gen_top_lookup. * Hotfix typo in gen_top lookup. * Add and use sum_hists helper. * Extend tes versions. * [cms] Hotfix tau energy calibration, skip e-fake mask. * [cms] Hotfix egamma calibrator, use same random numbers for all smearing variations. * Add option to skip auto categories in track_category_changes. * Add n_chunks entry to ChunkPosition. * mutliple fixes regarding empty files or (almost) empty chunks (columnflow#750) * mutliple fixes regarding empty files or (almost) empty chunks * move chunk skip out of variable loop * add AbsScEta to variable_map for backwards compatibility * use last instead of first chunk for empty outputs * Fix broadcasting with empty egamma collection. --------- Co-authored-by: Mathis Frahm <[email protected]> Co-authored-by: Marcel R. <[email protected]> * Add simple column selection to UniteColumns. * Remove unneeded columns in cms tec calibrator. * Add variabble_repr to control paths. (columnflow#751) * Hotfix tec, add back charge. * Log broken parquet file paths. * Cleanup of e/mu id, update law. * Fix cf_inspect script after coffea update. (columnflow#753) * Hotfix electron weight producer with nested working points. * Hotfix attributes added by taf decorators. * Rename max-runtime -> {htcondor,slurm}-runtime. (columnflow#755) * Simplify requiring producers. (columnflow#756) * Simplify requiring producers. * Add same mechanism for calibrators. * Revert pilot decisions. * Add muon_sr calibrator. (columnflow#754) * Hotfix version resolution from config. * Hotfix required producers/calibrators for workflows. * Persistent local files of BundleExternalFiles. (columnflow#752) * Presistent local files of BundleExternalFiles. * Fix files_dir property. * Better caching. * Preserve types. * Ensure clean dir. * Allow unpacking in remote envs. * Pass-through workflow requirements in CreateHistograms. * Feature/histogram user multiconfig (columnflow#709) * make HistogramsUserBase compatible with multi-config * backwards compatibility to single-config * improve flexibility & runtime of helper functions * make shifts a set * add inputs as argument to load_histograms --------- Co-authored-by: Marcel Rieger <[email protected]> Co-authored-by: Mathis Frahm <[email protected]> * update hist axis labels during histogram merging (columnflow#705) * update labels during histogram merging * move update_ax_labels to hist_util.py * Linting --------- Co-authored-by: Marcel Rieger <[email protected]> Co-authored-by: Mathis Frahm <[email protected]> * Fix variance of fake data in datacard writer, better logs. * Update law. * Fix mamba setup. --------- Co-authored-by: Marcel Rieger <[email protected]> Co-authored-by: Marcel R. <[email protected]> Co-authored-by: Mathis Frahm <[email protected]> Co-authored-by: Mathis Frahm <[email protected]> Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com> Co-authored-by: jomatthi <[email protected]> Co-authored-by: juvanden <[email protected]>
1 parent e764265 commit 7ab29f3

34 files changed

Lines changed: 1015 additions & 440 deletions

analysis_templates/cms_minimal/law.cfg

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ default_analysis: __cf_module_name__.config.analysis___cf_short_name_lc__.analys
2727
default_config: run2_2017_nano_v9
2828
default_dataset: st_tchannel_t_4f_powheg
2929

30-
calibration_modules: columnflow.calibration.cms.{jets,met,tau}, __cf_module_name__.calibration.example
30+
calibration_modules: columnflow.calibration.cms.{jets,met,tau,egamma,muon}, __cf_module_name__.calibration.example
3131
selection_modules: columnflow.selection.empty, columnflow.selection.cms.{json_filter,met_filters}, __cf_module_name__.selection.example
3232
reduction_modules: columnflow.reduction.default, __cf_module_name__.reduction.example
3333
production_modules: columnflow.production.{categories,matching,normalization,processes}, columnflow.production.cms.{btag,electron,jet,matching,mc_weight,muon,pdf,pileup,scale,parton_shower,seeds,gen_particles}, __cf_module_name__.production.example
@@ -65,6 +65,7 @@ htcondor_flavor: $CF_HTCONDOR_FLAVOR
6565
htcondor_share_software: False
6666
htcondor_memory: -1
6767
htcondor_disk: -1
68+
htcondor_runtime: 3h
6869
slurm_flavor: $CF_SLURM_FLAVOR
6970
slurm_partition: $CF_SLURM_PARTITION
7071

bin/cf_inspect.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,13 @@ def _load_nano_root(fname: str, treepath: str | None = None, **kwargs) -> ak.Arr
5959
except:
6060
return uproot.open(fname)
6161

62-
63-
def _load_h5(fname: str, **kwargs):
64-
import h5py
65-
return h5py.File(fname, "r")
62+
return coffea.nanoevents.NanoEventsFactory.from_root(
63+
source,
64+
treepath=treepath,
65+
mode="eager",
66+
runtime_cache=None,
67+
persistent_cache=None,
68+
).events()
6669

6770

6871
def load(fname: str, **kwargs) -> Any:
@@ -78,8 +81,6 @@ def load(fname: str, **kwargs) -> Any:
7881
return _load_nano_root(fname, **kwargs)
7982
if ext == ".json":
8083
return _load_json(fname, **kwargs)
81-
if ext in [".h5", ".hdf5"]:
82-
return _load_h5(fname, **kwargs)
8384
raise NotImplementedError(f"no loader implemented for extension '{ext}'")
8485

8586

columnflow/calibration/__init__.py

Lines changed: 51 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,44 +8,82 @@
88

99
import inspect
1010

11-
from columnflow.types import Callable
11+
import law
12+
1213
from columnflow.util import DerivableMeta
1314
from columnflow.columnar_util import TaskArrayFunction
15+
from columnflow.types import Callable, Sequence, Any
16+
17+
18+
class TaskArrayFunctionWithCalibratorRequirements(TaskArrayFunction):
19+
20+
require_calibrators: Sequence[str] | set[str] | None = None
21+
22+
def _req_calibrator(self, task: law.Task, calibrator: str) -> Any:
23+
# hook to customize how required calibrators are requested
24+
from columnflow.tasks.calibration import CalibrateEvents
25+
return CalibrateEvents.req_other_calibrator(task, calibrator=calibrator)
1426

27+
def requires_func(self, task: law.Task, reqs: dict, **kwargs) -> None:
28+
# no requirements for workflows in pilot mode
29+
if callable(getattr(task, "is_workflow", None)) and task.is_workflow() and getattr(task, "pilot", False):
30+
return
1531

16-
class Calibrator(TaskArrayFunction):
32+
# add required calibrators when set
33+
if (calibs := self.require_calibrators):
34+
reqs["required_calibrators"] = {calib: self._req_calibrator(task, calib) for calib in calibs}
35+
36+
def setup_func(
37+
self,
38+
task: law.Task,
39+
reqs: dict,
40+
inputs: dict,
41+
reader_targets: law.util.InsertableDict,
42+
**kwargs,
43+
) -> None:
44+
if "required_calibrators" in inputs:
45+
for calib, inp in inputs["required_calibrators"].items():
46+
reader_targets[f"required_calibrator_{calib}"] = inp["columns"]
47+
48+
49+
class Calibrator(TaskArrayFunctionWithCalibratorRequirements):
1750
"""
1851
Base class for all calibrators.
1952
"""
2053

2154
exposed = True
2255

56+
# register attributes for arguments accepted by decorator
57+
mc_only: bool = False
58+
data_only: bool = False
59+
2360
@classmethod
2461
def calibrator(
2562
cls,
2663
func: Callable | None = None,
2764
bases: tuple = (),
2865
mc_only: bool = False,
2966
data_only: bool = False,
67+
require_calibrators: Sequence[str] | set[str] | None = None,
3068
**kwargs,
3169
) -> DerivableMeta | Callable:
3270
"""
33-
Decorator for creating a new :py:class:`~.Calibrator` subclass with additional, optional
34-
*bases* and attaching the decorated function to it as ``call_func``.
71+
Decorator for creating a new :py:class:`~.Calibrator` subclass with additional, optional *bases* and attaching
72+
the decorated function to it as ``call_func``.
3573
36-
When *mc_only* (*data_only*) is *True*, the calibrator is skipped and not considered by
37-
other calibrators, selectors and producers in case they are evalauted on a
38-
:py:class:`order.Dataset` (using the :py:attr:`dataset_inst` attribute) whose ``is_mc``
39-
(``is_data``) attribute is *False*.
74+
When *mc_only* (*data_only*) is *True*, the calibrator is skipped and not considered by other calibrators,
75+
selectors and producers in case they are evalauted on a :py:class:`order.Dataset` (using the
76+
:py:attr:`dataset_inst` attribute) whose ``is_mc`` (``is_data``) attribute is *False*.
4077
4178
All additional *kwargs* are added as class members of the new subclasses.
4279
4380
:param func: Function to be wrapped and integrated into new :py:class:`Calibrator` class.
4481
:param bases: Additional bases for the new :py:class:`Calibrator`.
45-
:param mc_only: Boolean flag indicating that this :py:class:`Calibrator` should only run on
46-
Monte Carlo simulation and skipped for real data.
47-
:param data_only: Boolean flag indicating that this :py:class:`Calibrator` should only run
48-
on real data and skipped for Monte Carlo simulation.
82+
:param mc_only: Boolean flag indicating that this :py:class:`Calibrator` should only run on Monte Carlo
83+
simulation and skipped for real data.
84+
:param data_only: Boolean flag indicating that this :py:class:`Calibrator` should only run on real data and
85+
skipped for Monte Carlo simulation.
86+
:param require_calibrators: Sequence of names of other calibrators to add to the requirements.
4987
:return: New :py:class:`Calibrator` subclass.
5088
"""
5189
def decorator(func: Callable) -> DerivableMeta:
@@ -55,6 +93,7 @@ def decorator(func: Callable) -> DerivableMeta:
5593
"call_func": func,
5694
"mc_only": mc_only,
5795
"data_only": data_only,
96+
"require_calibrators": require_calibrators,
5897
}
5998

6099
# get the module name

columnflow/calibration/cms/egamma.py

Lines changed: 17 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
from columnflow.calibration import Calibrator, calibrator
2424
from columnflow.calibration.util import ak_random
2525
from columnflow.util import maybe_import, load_correction_set, DotDict
26-
from columnflow.columnar_util import set_ak_column, full_like
26+
from columnflow.columnar_util import TAFConfig, set_ak_column, full_like
2727
from columnflow.types import Any
2828

2929
ak = maybe_import("awkward")
@@ -37,7 +37,7 @@
3737

3838

3939
@dataclasses.dataclass
40-
class EGammaCorrectionConfig:
40+
class EGammaCorrectionConfig(TAFConfig):
4141
"""
4242
Container class to describe energy scaling and smearing configurations. Example:
4343
@@ -54,7 +54,7 @@ class EGammaCorrectionConfig:
5454
smear_syst_correction_set: str
5555
scale_compound: bool = False
5656
smear_syst_compound: bool = False
57-
systs: list[str] = dataclasses.field(default_factory=list)
57+
systs: list[str] = dataclasses.field(default_factory=lambda: ["scale_down", "scale_up", "smear_down", "smear_up"])
5858
corrector_kwargs: dict[str, Any] = dataclasses.field(default_factory=dict)
5959

6060

@@ -72,9 +72,10 @@ def _egamma_scale_smear(self: Calibrator, events: ak.Array, **kwargs) -> ak.Arra
7272
# gather inputs
7373
coll = events[self.collection_name]
7474
variable_map = {
75-
"run": events.run,
75+
"run": events.run if ak.sum(ak.num(coll, axis=1), axis=0) else [],
7676
"pt": coll.pt,
7777
"ScEta": coll.superclusterEta,
78+
"AbsScEta": abs(coll.superclusterEta),
7879
"r9": coll.r9,
7980
"seedGain": coll.seedGain,
8081
**self.cfg.corrector_kwargs,
@@ -109,22 +110,21 @@ def get_inputs(corrector, **additional_variables):
109110
events = set_ak_column(events, f"{self.collection_name}.pt_smear_uncorrected", coll.pt)
110111
events = set_ak_column(events, f"{self.collection_name}.energyErr_smear_uncorrected", coll.energyErr)
111112

112-
# helper to compute random variables in the shape of the collection
113-
def get_rnd(syst):
114-
args = (full_like(coll.pt, 0.0), full_like(coll.pt, 1.0))
115-
if self.use_deterministic_seeds:
116-
args += (coll.deterministic_seed,)
117-
rand_func = self.deterministic_normal[syst]
118-
else:
119-
# TODO: bit generator could be configurable
120-
rand_func = np.random.Generator(np.random.SFC64((events.event + sum(map(ord, syst))).to_list())).normal
121-
return ak_random(*args, rand_func=rand_func)
113+
# compute random variables in the shape of the collection once
114+
rnd_args = (full_like(coll.pt, 0.0), full_like(coll.pt, 1.0))
115+
if self.use_deterministic_seeds:
116+
rnd_args += (coll.deterministic_seed,)
117+
rand_func = self.deterministic_normal
118+
else:
119+
# TODO: bit generator could be configurable
120+
rand_func = np.random.Generator(np.random.SFC64((events.event).to_list())).normal
121+
rnd = ak_random(*rnd_args, rand_func=rand_func)
122122

123123
# helper to compute smeared pt and energy error values given a syst
124124
def apply_smearing(syst):
125125
# get smeared pt
126126
smear = self.smear_syst_corrector.evaluate(syst, *get_inputs(self.smear_syst_corrector))
127-
smear_factor = 1.0 + smear * get_rnd(syst)
127+
smear_factor = 1.0 + smear * rnd
128128
pt_smeared = coll.pt * smear_factor
129129
# get smeared energy error
130130
energy_err_smeared = (((coll.energyErr)**2 + (coll.energy * smear)**2) * smear_factor)**0.5
@@ -219,11 +219,8 @@ def _deterministic_normal(loc, scale, seed, idx_offset=0):
219219
for _loc, _scale, _seed in zip(loc, scale, seed)
220220
])
221221

222-
self.deterministic_normal = {
223-
"smear": functools.partial(_deterministic_normal, idx_offset=0),
224-
"smear_up": functools.partial(_deterministic_normal, idx_offset=1),
225-
"smear_down": functools.partial(_deterministic_normal, idx_offset=2),
226-
}
222+
# each systematic is to be evaluated with the same random number so use a fixed offset
223+
self.deterministic_normal = functools.partial(_deterministic_normal, idx_offset=0)
227224

228225

229226
electron_scale_smear = _egamma_scale_smear.derive(

0 commit comments

Comments
 (0)