Skip to content

ROOT files with duplicated GUIDs observed on production T0 replay workflows #37240

@khurtado

Description

@khurtado

This is related to a WMCore issue:

dmwm/WMCore#10870

Bug description
When deploying T0 replays with a significant amount of jobs, one of the WMCore components fail complaining about duplicated LFNs. Our LFN patterns look like this:

/store/unmerged/HG2202_Val/RelValProdMinBias/GEN-SIM/HG2202_Val_OLD_Alanv4-v22/00000/2AE85F14-94A1-EC11-BBF5-FA163EC7AA59.root

where: 2AE85F14-94A1-EC11-BBF5-FA163EC7AA59 is the GUID extracted from the ROOT file through the framework XML job report.

So we are basically observing 2 different jobs generating files with the same GUID.
We get the GUID from the framework XML job report here:

And since the GUID from the FW report seems to be generated here:
https://github.com/cms-sw/cmssw/blob/master/FWCore/Utilities/src/Guid.cc#L18-L28

I'm reporting the issue here.

How to reproduce
Deploy a Tier0 replay with a significant amount of jobs. I think @germanfgv can help with this if needed. At least one incident has been reported per week lately this year.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions