Skip to content
plestina edited this page Dec 18, 2015 · 14 revisions

#LegoCards - a datacard maker tool

This tool produces combine datacards from config files (normally in yaml format). Yaml is a very simple, human readable configuration format. To learn some yaml syntax visit wikipedia!

##The Main Configuration file

The main configuration file that unites them all is e.g. configs/datacard_8TeV_2e2mu.yaml for 2e2mu category. With agreement is to have one config file per event category.

Each main configuration (per final state) contains sections:

  • category - name of category (e.g. 2e2mu_UnTagged, 4e_UnTagged)
  • setup - defines common things (again, may contain functions_and_definitions)
  • functions_and_definitions - RooWSFactory expresions needed in the card. E.g. MH[125,120,130]
  • systematics - a list of systematicss or pointers to systematics configuration
  • observation - rate and source (path to tree and selection) of observed events.
  • processes - contains one subsection per process (ggH, qqZZ,..)

Later we will go to details of each section of our configuration file.

Usually there are many different inputs produced typically by different people scattered among different configuration files (input fragments) like:

  • yields.yaml
  • systematics.yaml
  • shapes_parametrizations.yaml

Therefore, we finally produce one configuration file which picks all the inforamtion that is in different fragments using syntax,e.g.

    INSERT(yield.yaml:2e2mu:ggH)

This particular statement will pickup the yield for ggH under 2e2mu final state from yields.yaml:

    # yields.yaml
    #-------------
    2e2mu:
        ggH  : 0.5
        qqZZ : 3.1
        ...

Either absolute paths or paths relative to caller-configuration containing INSERT are supported. More about INSERT command later.

###The sections of the Main Configuration

####Section category This section is a word that defines the name of a category. The category name will be used as a bin name in the text datacard. Example:

category: 2e2mu_UnTagged

The value provided must be a string w/o whitespaces like in the example. This is a limitation that comes from text datacard format rules.

####Section setup In this section we define some common things, like possible names of processes, names of reserved sections, replacment words and possibly functions and definitions. In the current implementation, this setup is only used for the definition of common variables and functions within functions_and_definitions subsection. Usually, the definitions in setup are common to multiple categories, so it is natural to put it into a separate configuration file and pull it with the INSERT command.

Example of setup section:

setup:
    words_to_be_replaced: [SQRTS, PROCESS, CATEGORY]
    reserved_sections:
        - observation
        - functions_and_definitions
        - setup
        - category
        - systematics
        - processes

    process_names: [ggH, qqH, WH, ZH, ttH, ggZZ, qqZZ, zjets]

    functions_and_definitions:
        #setup level definitions (observables, mass range, lumi,
        #whatever is common for all cards)
        - lumi_8[19.712]
        - MH[125, 105,140]

####Section functions_and_definitions This section contains a list of RooWorkspace::factory declarations. One can define all variables, functions, process norms and pdfs within this section.

In example:

functions_and_definitions:
    - MH[125,105,140]
    - expr::mean('125+0.99*(@0-125)', MH)

we define MH with initial value of 125 and within the range of 105-140, and then an expression/formula mean which is a function of MH.

Section functions_and_definitions can be repeted in setup (setup-level), as one of the sections at level-0 (together with category, observation, ...) and under each process (level-1).

####Section systematics In this section we fina a list of systematics that concern particular processes. The later entry in the list replaces the previos one defined. Section systematics can be repeted at level-0 and at level-1, i.e. under each process.

Example of systematics:

systematics:
    - hzz_pdf:
            type: lnN
            ggH : 1.026
            qqH : 1.026
            WH  : 1.026
            ZH  : 1.026
            ttH : 1.026
            qqZZ: 1.026
            ggZZ: 1.026

      cms_zz4l_bkg:
            type : param 0.0 1 [-3,3]
    - hzz_pdf:
            type: lnN
            ggH : 1.033

In the above example, hzz_pdf for ggH will finally be eveluated to 1.033 since the later number is considered as an updated value.

####Section observation In this section we provide the information about the observed data. Both, rate and the source of data are defined. Example:

observation:
    rate: 8
    source:
        path: inputs/data_observed_8TeV.root/passedEvents
        selection: (mass2e2mu>105.0 && mass2e2mu<140.0)
        observables:
        #the observables names correspond to branches in tree
        #and are interpreted with RooWSFactory. This list is first to be defined.
            - mass4l[INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:mass_range)]
            #- mass4l

In source/observables subsection one defines a list of observables that are used in the datacard. This will be evaluated by RooWSFactory before any functions_and_definitions. The names of observables corresponds to branches in tree This list is first to be defined. Otherwise, if just the names are provided only the branches with given name will be imported, but variables will not be defined in the workspace at this stage. To define them, one can use functions_and_definitions sections later.

####Section processes This section contains a dict of processes with corresponding information:

  • rate - expected rate of the process

  • functions_and_definitions - to declare additional variables, functions or pdfs used to build a shape or a normalization factor (in form PROCESS_norm).

  • shape - a definition of a shape, either a RooWSFactory form or a histogram Template::PROCESS_NAME(obs1,obs2,obs3, path/to/file.root/histo_name)

  • is_signal - a flag (1 or 0) to declare signal or background

  • systematics - (optional) to declare additional systematics for this process

Example:

processes:
    ggH:
        is_signal : 1
        rate : INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:UnTagged:ggH)
        functions_and_definitions:
            #level-1 definitions (process norm, shape parameters, all related
            #to this process)
            - expr::ggH_norm('@0',r_2e2mu)
            - expr::mean_3_8('INSERT(inputs/signal_shape_parametrization_8TeV_2e2mu.yaml:UnTagged:mean)',MH)
            - expr::sigma_3_8('INSERT(inputs/signal_shape_parametrization_8TeV_2e2mu.yaml:UnTagged:sigma)',MH)
            - expr::alpha_3_8('0.956',MH)
            - expr::n_3_8('4.713',MH)
            - expr::alpha2_3_8('1.377',MH)
            - expr::n2_3_8('6.2383+(0.318)*(@0-125)',MH)
        shape : "RooDoubleCB::ggH(mass4l, mean_3_8, sigma_3_8, alpha_3_8, n_3_8, alpha_3_8, n2_3_8)"
        systematics: #TODO
            one_extra_sys_ggH: lnN 1.03
            another_extra_sys_ggH: param 0.0 1 [-3,3]

    qqZZ:
        is_signal : 0
        rate : INSERT(inputs/yields_per_tag_category_13TeV_2e2mu.yaml:UnTagged:qqZZ)
        shape : Template::qqZZ(mass4l, inputs/qqZZ_2e2mu_mass4l.root/m4l_mass4l_105.6_140.6)

##Running LegoCards (example) Before running make sure you have installed yaml:

    wget http://pyyaml.org/download/pyyaml/PyYAML-3.11.tar.gz
    tar -xzvvf PyYAML-3.11.tar.gz
    cd PyYAML-3.11
    python setup.py install --user
  1. Update your configuration

  2. Make sure you have CMSSW environment and HiggsCombination package (we load some RooFit libs from there)

  3. To build e.g. 2e2mu datacard using configuration configs/datacard_8TeV_2e2mu.yaml:

    python build_datacard.py --cfg configs/datacard_8TeV_2e2mu.yaml -v 10

  4. The product of the builder are text datacards that point to a RooWorkspace.

  5. Use this cards as you like: with combineCards.py, text2workspace.py, combine ...

##RooWorkspace Factory Instructions

Process high-level object creation syntax
Accepted forms of syntax are

Creating variables

x[-10,10]             -  Create variable x with given range and put it in workspace
x[3,-10,10]           -  Create variable x with given range and initial value and put it in workspace
x[3]                  -  Create variable x with given constant value

<numeric literal>     - Numeric literal expressions (0.5, -3 etc..) are converted to a RooConst(<numeric literal>)
                        where ever a RooAbsReal or RooAbsArg argument is expected

Creating categories

c[lep,kao,nt1,nt2]    -  Create category c with given state names
tag[B0=1,B0bar=-1]    -  Create category tag with given state names and index assignments


Creating functions and p.d.f.s

MyPdf::g(x,m,s)       - Create p.d.f or function of type MyPdf with name g with argument x,m,s
                        Interpretation and number of arguments are mapped to the constructor arguments of the class
                        (after the name and title).

MyPdf(x,m,s)          - As above, but with an implicitly defined (unique) object name


Creating sets and lists (to be used as inputs above)

{a,b,c}               - Create RooArgSet or RooArgList (as determined by context) from given contents



Objects that are not created, are assumed to exist in the workspace
Object creation expressions as shown above can be nested, e.g. one can do

RooGaussian::g(x[-10,10],m[0],3)

to create a p.d.f and its variables in one go. This nesting can be applied recursively e.g.

SUM::model( f[0.5,0,1] * RooGaussian::g( x[-10,10], m[0], 3] ),
                        RooChebychev::c( x, {a0[0.1],a1[0.2],a2[-0.3]} ))

creates the sum of a Gaussian and a Chebychev and all its variables


A seperate series of operator meta-type exists to simplify the construction of composite expressions
meta-types in all capitals (SUM) create p.d.f.s, meta types in lower case (sum) create
functions.


SUM::name(f1*pdf1,f2*pdf2,pdf3]  -- Create sum p.d.f name with value f1*pdf1+f2*pdf2+(1-f1-f2)*pdf3
RSUM::name(f1*pdf1,f2*pdf2,pdf3] -- Create recursive sum p.d.f. name with value f1*pdf1 + (1-f1)(f2*pdf2 + (1-f2)pdf3)
ASUM::name(f1*amp1,f2*amp2,amp3] -- Create sum p.d.f. name with value f1*amp1+f2*amp2+(1-f1-f2)*amp3 where amplX are amplitudes of type RooAbsReal
sum::name(a1,a2,a3]              -- Create sum function with value a1+a2+a3
sum::name(a1*b1,a2*b2,a3*b 3]    -- Create sum function with value a1*b1+a2*b2+a3*b3

PROD::name(pdf1,pdf2]            -- Create product of p.d.f with 'name' with given input p.d.fs
PROD::name(pdf1|x,pdf2]          -- Create product of conditional p.d.f. pdf1 given x and pdf2
prod::name(a,b,c]                -- Create production function with value a*b*c

SIMUL::name(cat,a=pdf1,b=pdf2]   -- Create simultaneous p.d.f index category cat. Make pdf1 to state a, pdf2 to state b

EXPR::name('expr',var,...]       -- Create an generic p.d.f that interprets the given expression
expr::name('expr',var,...]       -- Create an generic function that interprets the given expression


The functionality of high level object creation tools like RooSimWSTool, RooCustomizer and RooClassFactory
is also interfaced through meta-types in the factory


Interface to RooSimWSTool

SIMCLONE::name( modelPdf, $ParamSplit(...),
                $ParamSplitConstrained(...), $Restrict(...) ]            -- Clone-and-customize modelPdf according to ParamSplit and ParamSplitConstrained()
                                                                            specifications and return a RooSimultaneous p.d.f. of all built clones

MSIMCLONE::name( masterIndex,
                $AddPdf(mstate1, modelPdf1, $ParamSplit(...)),
                $AddPdf(mstate2,modelPdf2),...) ]                       -- Clone-and-customize multiple models (modelPdf1,modelPdf2) according to ParamSplit and
                                                                            ParamSplitConstrained() specifications and return a RooSimultaneous p.d.f. of all built clones,
                                                                            using the specified master index to map prototype p.d.f.s to master states
Interface to RooCustomizer

EDIT::name( orig, substNode=origNode), ... ]                             -- Create a clone of input object orig, with the specified replacements operations executed
EDIT::name( orig, origNode=$REMOVE(), ... ]                              -- Create clone of input removing term origNode from all PROD() terms that contained it
EDIT::name( orig, origNode=$REMOVE(prodname,...), ... ]                  -- As above, but restrict removal of origNode to PROD term(s) prodname,...


Interface to RooClassFactory

CEXPR::name('expr',var,...]       -- Create an custom compiled p.d.f that evaluates the given expression
cexpr::name('expr',var,...]       -- Create an custom compiled function that evaluates the given expression


$MetaType(...)        - Meta argument that does not result in construction of an object but is used logically organize
                        input arguments in certain operator p.d.f. constructions. The defined meta arguments are context dependent.

                        The only meta argument that is defined globally is $Alias(typeName,aliasName) to
                        define aliases for type names. For the definition of meta arguments in operator p.d.f.s
                        see the definitions below

##Presentations and progress

  1. Giacomo Ortona 15/10/2015

    • discussion on inputs and division of labour
  2. Roko Plestina 28/10/2015

    • first prototype of datacards input format and base datacard maker
  3. Roko Plestina 06/11/2015

    • update on format and usage of card maker LegoCards
  4. Roko Plestina 20/11/2015

    • update on systematics treatment
  5. Roko Plestina 04/12/2015

    • update on usage of GitLab repository

Clone this wiki locally