Skip to content

More Dynamic Workflow #10

@jkguiang

Description

@jkguiang

At the moment, it seems that the workflow for this analysis framework is essentially based around a configuration JSON. While this is nice for simplicity, it might also suffer from its rigidity in the future. For example, I know that we have to do many checks and small studies for VBS HWW wherein I have to kind of brutalize my code in order to get timely results. As such, I wonder if you would be open to moving towards a different workflow, one that we've been using for VBS HWW and that I've been messing around with on the side. I think it could lend some flexibility that would prove generally useful.

The credit for this workflow is due to Philip. He, for a long time, has been organizing his analyses into "cutflow" objects. These cutflows are constructed by stringing together "cuts" in a tree structure. Each cut is given a name, and two lambda functions: one that contains the logic for the cut (i.e. returns pass or fail selection) and another that returns the event weight for that cut (e.g. xsec weight, b-tagging scale factors). Overall, this workflow allows for a lot of niceties, and I have listed a few under "Pros" at the bottom that it also shares. However, while I think it is nice, I think it can also be simplified and improved.

I would like to propose the following overall structure. It seems that a binary search tree is more fitting, in general, than the multi-branch tree that we use in VBS HWW. That is, each cut is a node in the BST, and whether it returns true or false determines if it goes right or left (respectively) in the tree. The iteration in the event loop then terminates when a leaf is reached. For example, I took Philip's cutflow idea (and related tools) and put them into something I call RAPIDO. I would take a similar approach within Python, where a lot of this stuff would be way simpler due to Python's dynamic typing (a lot of acrobatics is needed to achieve a similar effect in C++, and it is needed for writing to TTrees). That is, I would use a similar structure to what you have now, where selections are separated into their own functions, etc. However, I would change the overall organization towards this cutflow structure.

Let me conclude with a quick summary of the workflow along with a few "pros" I would like to highlight.

Workflow:
I propose that you organize the analysis like a BST cutflow, where you have one (or a few) common cutflow(s) in this repo. Contributors then clone this repo and make necessary changes to these common cutflows to answer questions, do weird studies, etc. Common object IDs and other tools (e.g. Python equivalent of cmstas/NanoTools) are also kept here.

Pros:

  • More explicitly (i.e. pep8/general gospel of "explicit over implicit") organized, less hidden behind nested objects
  • Cutflows are easily printable
  • Simple/diagnostic histograms can be filled while looping over events at different stages of the cutflow
  • Different signal regions or control regions are more easily/explicitly definable in the BST framework
  • The BDTs that you ultimately produce can also be exported to this BST format and analyzed as above

Apologies for the long issue. Please let me know if this is at all interesting to you; I would be happy to meet with folks to discuss further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions