Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR will ultimately add a few functions for querying our data. The goal is to have an easier system for accessing large amounts of data in a flexible way without having to write many many lines of code, which takes a lot of coder time, and slows down the iterative process of looking at data. The queries will be declarative and resemble SQL queries (
SELECT fields FROM data-source WHERE condition). These functions will take advantage of the structures/codebase we have built up:dbettoto dig through metadata and parameter databasesLH5Iteratorto scale well in terms of memory enable parallelismSo far, I have added a metadata query, with a data query, data hist-query and (maybe) event query to come. This splits the query into a run query (based on period, run, datatype (e.g.

cal,phy, etc.), and starttime), and a channel query (using anything from our databases, identifying using shortcuts prepended with@). Right now that is@detfor detector database and@parfor analysis parameters and@runfor run info; this can be extended. The needed databases are found usingdbetto, and are currently hard-coded. Due to the length of names here, the ability to alias them has been added). Here's what the meta-data query looks like:This function uses the dataflow config (pointed to by a refprod argument or the
REFPRODenvironment variable) to construct a legend metadata instance, usesmeta.datasets.runinfoto find and query the runs. Thechannelmapis used to loop over detectors and get information for@det. The paths pointed to in the config bypars_*are used for@par. Currentlyevalis used to evaluate the run and channel queries. This can return aspd,ak, orlgdo, althoughlgdowon't always work due to unsupported data types.To-do/requests:
legend-testdataof some directories structured like our productions, with config files, analysis parameters, metadata, etc.evttier is going to be a sticking point since it is structured differently than others. This is the basis for my suggestion of adding dataset views to lh5 (Views legend-data-format-specs#13), which would benefit from having views from each detector's events to the corresponding events. This can be worked around for global trigger data, but is very challenging forcaldata