Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 55 additions & 3 deletions src/python/WMCore/MicroService/MSOutput/MSOutput.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
from WMCore.MicroService.MSOutput.MSOutputTemplate import MSOutputTemplate
from WMCore.WMException import WMException
from WMCore.Services.AlertManager.AlertManagerAPI import AlertManagerAPI
from WMCore.Services.DBS.DBS3Writer import DBS3Writer


class MSOutputException(WMException):
Expand Down Expand Up @@ -248,6 +249,53 @@ def _executeConsumer(self, summary):
self.logger.exception(msg)
self.updateReportDict(summary, "error", msg)

def setDBSStatus(self, workflow):
"""
The function to set the DBS status of outputs as VALID
:param workflow: a MSOutputTemplate object workflow
:return: the MSOutputTemplate object itself (with the necessary updates in place)
"""
if not isinstance(workflow, MSOutputTemplate):
msg = "Unsupported type object '{}' for workflows! ".format(type(workflow))
msg += "It needs to be of type: MSOutputTemplate"
raise UnsupportedError(msg)

dbs3Writer = DBS3Writer(url=self.msConfig["dbsReadUrl"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to initialize it in the __init__ method instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine now

writeUrl=self.msConfig["dbsWriteUrl"])

# if anything fail along the way, set it back to "pending"
dbsUpdateStatus = "done"
for dMap in workflow['OutputMap']:

if self.msConfig['enableDbsStatusChange']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would remove this option (and the dry-run mode). When we have it implemented, I believe there is no reason to disable this feature in MSOutput, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it to make the switch from Unified to WMCore easy. So, once the PR is good to merge, we can merge it and put it to production with enableDbsStatusChange False and simply it will not take any action but to print some logs. Once we're ready to switch from Unified to WMCore, we can simply change this parameter to True without touching the source code. Does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could enable this DBS status change in MSOutput and keep both this service and Unified updating the DBS status for a day or two (I'm almost sure it's harmless and nothing fails). Then we have time to stop Unified.

At this stage, dry-run mode is just an over-complication of the code IMO.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. This option is removed in the latest commit.


res = dbs3Writer.setDBSStatus(dataset=dMap["Dataset"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my comments on the DBS3Writer module will impact this logic.

Another comment, I failed to see where this self.msConfig['dbsStatus'] gets set in this module. Perhaps I would suggest to define an object variable like:

self.dbsOutputStatus = "VALID"

and simply refer to it here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made it configurable: https://github.com/dmwm/deployment/pull/1044/files#diff-3a773e3fbfb5ff69f5fd806e7a4ab54a7dfaaf1f080e7e96ac1df9e73589c2ffR115 It's also possible to set it within the module. Which one do you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the deployment changes. Thank for pointing it out.
I think it's fine to keep it in the service configuration file. Nonetheless, I would set a default value to "VALID" in this module, in case the configuration does not have it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since deployment has the value, I did not set a default value in this module to avoid duplicates. If you think, it's more suitable to keep the configuration in MSOutput module instead of deployment repository, that's also fine, I can change it if you'd like.

status=self.msConfig['dbsStatus']["valid"])

if res:
dMap["DBSStatus"] = self.msConfig['dbsStatus']["valid"]
else:
# There is at least one dataset whose dbs status update is unsuccessful
dbsUpdateStatus = "pending"

else:
msg = "DRY-RUN DBS: DBS status change for DID: {}, ".format(dMap['Dataset'])
self.logger.info(msg)

# Finally, update the MSOutput template document with either partial or
# complete dbs statuses
self.docKeyUpdate(workflow, OutputMap=workflow['OutputMap'])
workflow.updateTime()
if dbsUpdateStatus == "done":
self.logger.info("All the DBS status updates succeeded for: %s. Marking it as 'done'",
workflow['RequestName'])
self.docKeyUpdate(workflow, DBSUpdateStatus='done')
else:
self.logger.info("DBS status updates partially successful for: %s. Keeping it 'pending'",
workflow['RequestName'])

return workflow

def makeSubscriptions(self, workflow):
"""
The common function to make the final subscriptions
Expand Down Expand Up @@ -448,28 +496,32 @@ def msOutputConsumer(self):
# Done: To build it through a pipe
# Done: To write back the updated document to MonogoDB
msPipelineRelVal = Pipeline(name="MSOutputConsumer PipelineRelVal",
funcLine=[Functor(self.makeSubscriptions),
funcLine=[Functor(self.setDBSStatus),
Functor(self.makeSubscriptions),
Functor(self.makeTapeSubscriptions),
Functor(self.docUploader,
update=True,
keys=['LastUpdate',
'TransferStatus',
'DBSUpdateStatus',
'OutputMap']),
Functor(self.docDump, pipeLine='PipelineRelVal'),
Functor(self.docCleaner)])
msPipelineNonRelVal = Pipeline(name="MSOutputConsumer PipelineNonRelVal",
funcLine=[Functor(self.makeSubscriptions),
funcLine=[Functor(self.setDBSStatus),
Functor(self.makeSubscriptions),
Functor(self.makeTapeSubscriptions),
Functor(self.docUploader,
update=True,
keys=['LastUpdate',
'TransferStatus',
'DBSUpdateStatus',
'OutputMap']),
Functor(self.docDump, pipeLine='PipelineNonRelVal'),
Functor(self.docCleaner)])

wfCounterTotal = 0
mQueryDict = {'TransferStatus': 'pending'}
mQueryDict = { "$or": [ { "TransferStatus": "pending" }, { "DBSUpdateStatus": "pending" } ] }
pipeCollections = [(msPipelineRelVal, self.msOutRelValColl),
(msPipelineNonRelVal, self.msOutNonRelValColl)]
for pipeColl in pipeCollections:
Expand Down
10 changes: 7 additions & 3 deletions src/python/WMCore/MicroService/MSOutput/MSOutputTemplate.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ def docSchema(self):
'Copies': 1,
...}],
"TransferStatus": "pending"|"done,
"DBSUpdateStatus": "pending"|"done,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I would make it a boolean, thus something like:

"DBSStatusUpdated": False|True,

but then it would require more changes from your side. So I leave it up to you.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, this is applied in the last commit.

"RequestType": ""
}
:return: a list of tuples
Expand All @@ -116,7 +117,8 @@ def docSchema(self):
('IsRelVal', False, bool),
('OutputDatasets', [], list),
('OutputMap', [], list),
('TransferStatus', "pending", (bytes, str))]
('TransferStatus', "pending", (bytes, str)),
('DBSUpdateStatus', "pending", (bytes, str))]
return docTemplate

def outputMapSchema(self):
Expand All @@ -135,7 +137,8 @@ def outputMapSchema(self):
'DiskDestination': "",
'TapeDestination': "",
'DiskRuleID': "",
'TapeRuleID': ""}
'TapeRuleID': "",
'DBSStatus': ""}
:return: a list of tuples
"""
outMapTemplate = [
Expand All @@ -146,7 +149,8 @@ def outputMapSchema(self):
('DiskDestination', "", (bytes, str)),
('TapeDestination', "", (bytes, str)),
('DiskRuleID', "", (bytes, str)),
('TapeRuleID', "", (bytes, str))]
('TapeRuleID', "", (bytes, str)),
('DBSStatus', "", (bytes, str))]
return outMapTemplate

def _checkAttr(self, myDoc, update=False, throw=False, **kwargs):
Expand Down
29 changes: 29 additions & 0 deletions src/python/WMCore/Services/DBS/DBS3Reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -971,3 +971,32 @@ def getParentDatasetTrio(self, childDataset):
frozenKey = frozenset(runLumiPair)
parentFrozenData[frozenKey] = fileId
return parentFrozenData

def getDBSStatus(self, dataset):
"""
The function to get the DBS status of outputs
:param dataset: dataset name
:return: DBS status of the given dataset
"""

allowedDbsStatuses = ["VALID", "INVALID", "PRODUCTION"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a dataset can have other statuses. If I'm not wrong, DELETED and DEPRECATED as well. However, I don't think we need to have a list of valid statuses, we just use whatever is provided by DBS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also unsure whether it's necessary to put this extra check and honestly I did not have a strong reason to do so. So, it's okay to remove it to reduce the level of complexity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed, this extra check is also removed.


response = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
response = None
dbsStatus = None

try:
response = self.dbs.listDatasets(dataset=dataset, dataset_access_type='*', detail=True)
except Exception as ex:
msg = "Exception while getting the status of following dataset on DBS: {} ".format(dataset)
msg += "Error: {}".format(str(ex))
self.logger.exception(msg)

if response:
dbsStatus = response[0]['dataset_access_type']
isAllowedStatus = dbsStatus in allowedDbsStatuses

if isAllowedStatus:
self.logger.info("%s is %s", dataset, dbsStatus)
return dbsStatus
else:
raise Exception("This is not an allowed DBS status: {}".format(str(dbsStatus)))
else:
return None
58 changes: 58 additions & 0 deletions src/python/WMCore/Services/DBS/DBS3Writer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/usr/bin/env python
"""
_DBSReader_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong copy/pasting? :-D You might just remove it as well.

Read/Write DBS Interface
"""
from __future__ import print_function, division
from builtins import str
import logging

from dbs.apis.dbsClient import DbsApi
from dbs.exceptions.dbsClientException import dbsClientException

from WMCore.Services.DBS.DBSErrors import DBSWriterError, formatEx3
from WMCore.Services.DBS.DBS3Reader import DBS3Reader


class DBS3Writer(DBS3Reader):
"""
_DBSReader_
General API for writing data to DBS
"""

def __init__(self, url, writeUrl, logger=None, **contact):

# instantiate dbs api object
try:
super(DBS3Writer, self).__init__(url=url)
self.dbs = DbsApi(writeUrl, **contact)
self.logger = logger or logging.getLogger(self.__class__.__name__)
except dbsClientException as ex:
msg = "Error in DBSWriter with DbsApi\n"
msg += "%s\n" % formatEx3(ex)
raise DBSWriterError(msg)

def setDBSStatus(self, dataset, status):
"""
The function to set the DBS status of an output dataset
:param dataset: Dataset name
:return: True if operation is successful, False otherwise
"""

try:
self.dbs.updateDatasetType(dataset=dataset,
dataset_access_type=status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation needs fixing. I would suggest to remove the blank line at 44 as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, we should fetch the output of this call and log the error in case it happens; or return it to the caller module.
By parsing the output of that call, we can also get rid of of the getDBSStatus implementation in the lines below.

Copy link
Author

@haozturk haozturk Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought that but the function returns None if the call is successful and throws an exception o/w. I would prefer it to return True or False respectively. Since it does not do that, I handled it with getDBSStatus calls to make it more intuitive (according to me :) ) . We can also handle it:

  • Return True if updateDatasetType returns None
  • Return False o/w

Which one do you prefer?

Btw, we already log the error in the except clause, in case the API fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, too bad it returns None in case the call is successful..

Return True if updateDatasetType returns None
Return False o/w

This is my preference. Given that - if there is an exception - the error would likely be on the server side, I would suggest to change the log level from exception to error, such that we do not pollute the service logs with unnecessary information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied your suggestions Alan, but setDBSStatus still returns True if the API call is successful, False otherwise. This looks quite intuitive to me. Do you agree?

except Exception as ex:
msg = "Exception while setting the status of following dataset on DBS: {} ".format(dataset)
msg += "Error: {}".format(str(ex))
self.logger.exception(msg)

dbsStatus = super(DBS3Writer, self).getDBSStatus(dataset)

if dbsStatus == status:
return True
else:
return False
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,15 @@
{'block_name': '/Cosmics/Commissioning2015-PromptReco-v1/RECO#004ac3ba-d09e-11e4-afad-001e67ac06a0'}],
['listBlockParents',
{'block_name': '/Cosmics/Commissioning2015-v1/RAW#942d76fe-cf0e-11e4-afad-001e67ac06a0'}],
['listDatasets',
{'dataset_access_type': '*',
'dataset': '/ggXToJPsiJPsi_JPsiToMuMu_M6p2_JPCZeroMinusPlus_TuneCP5_13TeV-pythia8-JHUGen/RunIIFall17pLHE-93X_mc2017_realistic_v3-v2/LHE',
'detail': True}],
['listDatasets',
{'dataset_access_type': '*',
'dataset': '/DYToEE_M-50_NNPDF31_TuneCP5_14TeV-powheg-pythia8/Run3Summer19DRPremix-BACKFILL_2024Scenario_106X_mcRun3_2024_realistic_v4-v6/GEN-SIM-RAW',
'detail': True}],


# Exception throwing calls

Expand Down
Loading