-
Notifications
You must be signed in to change notification settings - Fork 1
Home
#LegoCards - a datacard maker tool
This tool produces combine datacards from config files (normally in yaml format).
Yaml is a very simple, human readable configuration format.
To learn some yaml syntax visit wikipedia!
##The Main Configuration file
The main configuration file that unites them all is e.g. configs/datacard_8TeV_2e2mu.yaml for 2e2mu category. With agreement is to have one config file per event category.
Each main configuration (per final state) contains sections:
-
category- name of category (e.g.2e2mu_UnTagged,4e_UnTagged) -
setup- defines common things (again, may containfunctions_and_definitions) -
functions_and_definitions- RooWSFactory expresions needed in the card. E.g.MH[125,120,130] -
systematics- a list of systematicss or pointers to systematics configuration -
observation- rate and source (path to tree and selection) of observed events. -
processes- contains one subsection per process (ggH,qqZZ,..)
Later we will go to details of each section of our configuration file.
Usually there are many different inputs produced typically by different people scattered among different configuration files (input fragments) like:
yields.yamlsystematics.yamlshapes_parametrizations.yaml
Therefore, we finally produce one configuration file which picks all the inforamtion that is in different fragments using syntax,e.g.
INSERT(yield.yaml:2e2mu:ggH)
This particular statement will pickup the yield for ggH under 2e2mu final state from yields.yaml:
# yields.yaml
#-------------
2e2mu:
ggH : 0.5
qqZZ : 3.1
...
Either absolute paths or paths relative to caller-configuration containing INSERT are supported.
More about INSERT command later.
###The sections of the Main Configuration
####Section category
This section is a word that defines the name of a category.
The category name will be used as a bin name in the text datacard.
Example:
category: 2e2mu_UnTagged
The value provided must be a string w/o whitespaces like in the example. This is a limitation that comes from text datacard format rules.
####Section setup
In this section we define some common things, like possible names of processes,
names of reserved sections, replacment words and possibly functions and definitions.
In the current implementation, this setup is only used for the definition
of common variables and functions within functions_and_definitions subsection.
Usually, the definitions in setup are common to multiple categories, so it is
natural to put it into a separate configuration file and pull it with the INSERT
command.
Example of setup section:
setup:
words_to_be_replaced: [SQRTS, PROCESS, CATEGORY]
reserved_sections:
- observation
- functions_and_definitions
- setup
- category
- systematics
- processes
process_names: [ggH, qqH, WH, ZH, ttH, ggZZ, qqZZ, zjets]
functions_and_definitions:
#setup level definitions (observables, mass range, lumi,
#whatever is common for all cards)
- lumi_8[19.712]
- MH[125, 105,140]
####Section functions_and_definitions
This section contains a list of RooWorkspace::factory declarations.
One can define all variables, functions, process norms and pdfs within this section.
In example:
functions_and_definitions:
- MH[125,105,140]
- expr::mean('125+0.99*(@0-125)', MH)
we define MH with initial value of 125 and within the range of 105-140,
and then an expression/formula mean which is a function of MH.
Section functions_and_definitions can be repeted in setup (setup-level), as one of the
sections at level-0 (together with category, observation, ...) and under each
process (level-1).
####Section systematics
In this section we fina a list of systematics that concern particular processes.
The later entry in the list replaces the previos one defined.
Section systematics can be repeted at level-0 and at level-1, i.e. under each process.
Example of systematics:
systematics:
- hzz_pdf:
type: lnN
ggH : 1.026
qqH : 1.026
WH : 1.026
ZH : 1.026
ttH : 1.026
qqZZ: 1.026
ggZZ: 1.026
cms_zz4l_bkg:
type : param 0.0 1 [-3,3]
- hzz_pdf:
type: lnN
ggH : 1.033
In the above example, hzz_pdf for ggH will finally be eveluated to 1.033 since
the later number is considered as an updated value.
####Section observation
In this section we provide the information about the observed data.
Both, rate and the source of data are defined.
Example:
observation:
rate: 8
source:
path: inputs/data_observed_8TeV.root/passedEvents
selection: (mass2e2mu>105.0 && mass2e2mu<140.0)
observables:
#the observables names correspond to branches in tree
#and are interpreted with RooWSFactory. This list is first to be defined.
- mass4l[INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:mass_range)]
#- mass4l
In source/observables subsection one defines a list of observables that are
used in the datacard. This will be evaluated by RooWSFactory before any
functions_and_definitions. The names of observables corresponds to branches in tree
This list is first to be defined. Otherwise, if just the names are provided only
the branches with given name will be imported, but variables will not be defined
in the workspace at this stage. To define them, one can use functions_and_definitions
sections later.
####Section processes
This section contains a dict of processes with corresponding information:
-
rate- expected rate of the process -
functions_and_definitions- to declare additional variables, functions or pdfs used to build a shape or a normalization factor (in formPROCESS_norm). -
shape- a definition of a shape, either aRooWSFactoryform or a histogramTemplate::PROCESS_NAME(obs1,obs2,obs3, path/to/file.root/histo_name) -
is_signal- a flag (1 or 0) to declare signal or background -
systematics- (optional) to declare additional systematics for this process
Example:
processes:
ggH:
is_signal : 1
rate : INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:UnTagged:ggH)
functions_and_definitions:
#level-1 definitions (process norm, shape parameters, all related
#to this process)
- expr::ggH_norm('@0',r_2e2mu)
- expr::mean_3_8('INSERT(inputs/signal_shape_parametrization_8TeV_2e2mu.yaml:UnTagged:mean)',MH)
- expr::sigma_3_8('INSERT(inputs/signal_shape_parametrization_8TeV_2e2mu.yaml:UnTagged:sigma)',MH)
- expr::alpha_3_8('0.956',MH)
- expr::n_3_8('4.713',MH)
- expr::alpha2_3_8('1.377',MH)
- expr::n2_3_8('6.2383+(0.318)*(@0-125)',MH)
shape : "RooDoubleCB::ggH(mass4l, mean_3_8, sigma_3_8, alpha_3_8, n_3_8, alpha_3_8, n2_3_8)"
systematics: #TODO
one_extra_sys_ggH: lnN 1.03
another_extra_sys_ggH: param 0.0 1 [-3,3]
qqZZ:
is_signal : 0
rate : INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:UnTagged:qqZZ)
shape : Template::qqZZ(mass4l, inputs/qqZZ_2e2mu_mass4l.root/m4l_mass4l_105.6_140.6)
##Running LegoCards (example) Before running make sure you have installed yaml:
wget http://pyyaml.org/download/pyyaml/PyYAML-3.11.tar.gz
tar -xzvvf PyYAML-3.11.tar.gz
cd PyYAML-3.11
python setup.py install --user
-
Update your configuration
-
Make sure you have CMSSW environment and HiggsCombination package (we load some RooFit libs from there)
-
To build e.g. 2e2mu datacard using configuration configs/datacard_8TeV_2e2mu.yaml:
python build_datacard.py --cfg configs/datacard_8TeV_2e2mu.yaml -v 10
-
The product of the builder are text datacards that point to a RooWorkspace.
-
Use this cards as you like: with
combineCards.py,text2workspace.py,combine...
##RooWorkspace Factory Instructions
Process high-level object creation syntax
Accepted forms of syntax are
Creating variables
x[-10,10] - Create variable x with given range and put it in workspace
x[3,-10,10] - Create variable x with given range and initial value and put it in workspace
x[3] - Create variable x with given constant value
<numeric literal> - Numeric literal expressions (0.5, -3 etc..) are converted to a RooConst(<numeric literal>)
where ever a RooAbsReal or RooAbsArg argument is expected
Creating categories
c[lep,kao,nt1,nt2] - Create category c with given state names
tag[B0=1,B0bar=-1] - Create category tag with given state names and index assignments
Creating functions and p.d.f.s
MyPdf::g(x,m,s) - Create p.d.f or function of type MyPdf with name g with argument x,m,s
Interpretation and number of arguments are mapped to the constructor arguments of the class
(after the name and title).
MyPdf(x,m,s) - As above, but with an implicitly defined (unique) object name
Creating sets and lists (to be used as inputs above)
{a,b,c} - Create RooArgSet or RooArgList (as determined by context) from given contents
Objects that are not created, are assumed to exist in the workspace
Object creation expressions as shown above can be nested, e.g. one can do
RooGaussian::g(x[-10,10],m[0],3)
to create a p.d.f and its variables in one go. This nesting can be applied recursively e.g.
SUM::model( f[0.5,0,1] * RooGaussian::g( x[-10,10], m[0], 3] ),
RooChebychev::c( x, {a0[0.1],a1[0.2],a2[-0.3]} ))
creates the sum of a Gaussian and a Chebychev and all its variables
A seperate series of operator meta-type exists to simplify the construction of composite expressions
meta-types in all capitals (SUM) create p.d.f.s, meta types in lower case (sum) create
functions.
SUM::name(f1*pdf1,f2*pdf2,pdf3] -- Create sum p.d.f name with value f1*pdf1+f2*pdf2+(1-f1-f2)*pdf3
RSUM::name(f1*pdf1,f2*pdf2,pdf3] -- Create recursive sum p.d.f. name with value f1*pdf1 + (1-f1)(f2*pdf2 + (1-f2)pdf3)
ASUM::name(f1*amp1,f2*amp2,amp3] -- Create sum p.d.f. name with value f1*amp1+f2*amp2+(1-f1-f2)*amp3 where amplX are amplitudes of type RooAbsReal
sum::name(a1,a2,a3] -- Create sum function with value a1+a2+a3
sum::name(a1*b1,a2*b2,a3*b 3] -- Create sum function with value a1*b1+a2*b2+a3*b3
PROD::name(pdf1,pdf2] -- Create product of p.d.f with 'name' with given input p.d.fs
PROD::name(pdf1|x,pdf2] -- Create product of conditional p.d.f. pdf1 given x and pdf2
prod::name(a,b,c] -- Create production function with value a*b*c
SIMUL::name(cat,a=pdf1,b=pdf2] -- Create simultaneous p.d.f index category cat. Make pdf1 to state a, pdf2 to state b
EXPR::name('expr',var,...] -- Create an generic p.d.f that interprets the given expression
expr::name('expr',var,...] -- Create an generic function that interprets the given expression
The functionality of high level object creation tools like RooSimWSTool, RooCustomizer and RooClassFactory
is also interfaced through meta-types in the factory
Interface to RooSimWSTool
SIMCLONE::name( modelPdf, $ParamSplit(...),
$ParamSplitConstrained(...), $Restrict(...) ] -- Clone-and-customize modelPdf according to ParamSplit and ParamSplitConstrained()
specifications and return a RooSimultaneous p.d.f. of all built clones
MSIMCLONE::name( masterIndex,
$AddPdf(mstate1, modelPdf1, $ParamSplit(...)),
$AddPdf(mstate2,modelPdf2),...) ] -- Clone-and-customize multiple models (modelPdf1,modelPdf2) according to ParamSplit and
ParamSplitConstrained() specifications and return a RooSimultaneous p.d.f. of all built clones,
using the specified master index to map prototype p.d.f.s to master states
Interface to RooCustomizer
EDIT::name( orig, substNode=origNode), ... ] -- Create a clone of input object orig, with the specified replacements operations executed
EDIT::name( orig, origNode=$REMOVE(), ... ] -- Create clone of input removing term origNode from all PROD() terms that contained it
EDIT::name( orig, origNode=$REMOVE(prodname,...), ... ] -- As above, but restrict removal of origNode to PROD term(s) prodname,...
Interface to RooClassFactory
CEXPR::name('expr',var,...] -- Create an custom compiled p.d.f that evaluates the given expression
cexpr::name('expr',var,...] -- Create an custom compiled function that evaluates the given expression
$MetaType(...) - Meta argument that does not result in construction of an object but is used logically organize
input arguments in certain operator p.d.f. constructions. The defined meta arguments are context dependent.
The only meta argument that is defined globally is $Alias(typeName,aliasName) to
define aliases for type names. For the definition of meta arguments in operator p.d.f.s
see the definitions below
##Presentations and progress
-
- discussion on inputs and division of labour
-
- first prototype of datacards input format and base datacard maker
-
- update on format and usage of card maker LegoCards
-
- update on systematics treatment
-
- update on usage of GitLab repository