-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cms-2016-simulated-datasets: updates done #207
Conversation
60e5220
to
996f971
Compare
cmd = 'dasgoclient -query "' | ||
if query != "dataset": | ||
cmd += query + ' ' | ||
cmd += 'dataset=' + dataset + '" -json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW since we are using Python 3, expressions like this can be made more readable by using f-strings:
old:
cmd += 'dataset=' + dataset + '" -json'
new:
cmd += f'dataset={dataset}" -json'
The same technique to format string with variable replacements could be used elsewhere in the code in order to simplify string concatenation etc.
@tiborsimko Missing file added and tested. Updates are done (apart from the print format) Resulting record JSON of the six test datasets : |
1fb5083
to
a36eec2
Compare
a36eec2
to
440bfab
Compare
5ffc345
to
1496945
Compare
For the pileup (see cernopendata/opendata.cern.ch#3569)
|
600d931
to
b922c4c
Compare
a1a9df2
to
d7cc784
Compare
2a2ca53
to
75ee4a9
Compare
@tiborsimko the html docs contain some process-specific information so we need to run them for all. |
Take care of the cases where DAS finds two parent datasets. This makes the script fail when AODSIM is taken as parent (instead of MINIAODSIM) in https://github.com/cernopendata/data-curation/blob/cms-2016-sim-test/cms-2016-simulated-datasets/code/dataset_records.py#L456 Make sure that MINIAODSIM is picked. There were 56 of such AODSIM error messages |
Many datasets (also those with the gridpack available) miss the LHE information (or only have the production script displayed) There's ìndeed no
|
This file is now https://opendata.cern.ch/eos/opendata/cms/dataset-semantics/patsize.css so inspectNanoFile should be updated to use it (I'm updating the existing doc html files under |
For the record, the list of 2016 MC datasets that do not have CODP records yet are in /eos/user/c/cmsdpoa/data-curation/cms-2016-simulated-datasets/missing-2024-03.txt (with |
An example record that should have mcdb info https://opendata.cern.ch/record/72661 |
The directory has now been copied to Notes:
If we keep the directory structure as it is now, the code should
|
57696dd
to
4397736
Compare
Observed still missing things in the provenance, unfortunately also among the ZZZ samples that get displayed first e.g. https://opendata.cern.ch/record/75597 That's due to lacking information in the lhe_generator/2016-sim/gridpacks The fixed code would find these. But the lhe_generator directory for those datasets got generated before the fix. I open a separate issue to fix the records that need to be completed because I do not want to rerun the full record generation. As the code update is already done, it probably does not require changes in this PR. And they are only 15, see cernopendata/opendata.cern.ch#3652 |
Uses O(1k) CMS 2016 MC datasets in order to have a richer dataset sample for testing of metadata extraction. Enriches the documentation and the global `.gitignore` file.
This commits brings numerous improvements and changes necessary to complete the first full data extraction run and the record generation run on the complete CMS 2016 data input.
…path search Closes #238
4397736
to
0878405
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebasing and merging, the set of scripts used to publish CMS 2016 SIM records, with several post-release updates.
Addresses #182
Adds code for all steps.
The logic has been changed to find the provenance through the production chain.
Input files are for testing only.
Tested on 3 datasets only, for them, it works fine: gives the full provenance, LHE included.
Ready for the final updates in #182 (comment)