AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

DrDomenicoMarson · 2022-09-21T10:48:38Z

Here I want to tackle an issue that may change the behavior of some existing scripts, but I think for the better.

Currently, in the AMBER parser, we have a function called file_validation (the name is pretty misleading because indeed it validates the file, but more importantly it returns either False or an object representation of the input AMBER file, on which then the other functions operate).
This function, which should return a "file parsed object FEData", whenever it encounters some problem reading the file, just prints a descriptive WARNING and switches an internal variable invalid to True, then continues reading the file as it nothing happened, only to return False if invalid was True instead of returning the parsed (and invalid) object.
This function is called when extracting dHdl and u_nk, and here is funny, because in both cases we do:

file_datum = file_validation(outfile)
if not file_validation(outfile):  # pragma: no cover
    return None

which in itself is redundant, as it was sufficient to check if not file_datum: return None without parsing the same file twice.

The flow right now then is that we read and parse the AMBER output file to create the FEData object, if the file it's ok we continue parsing the AMBER out to get the data we need, otherwise, both extract_* functions return None.

I'm proposing some changes here (which have not to be implemented altogether)

remove the redundant file_Validation call (this I think must be done 😆 )
in file_validation, as soon as problems that now make the function return False, instead of switching invalid=True, raise a proper Exception (maintaining the current output warning). In this way, we won't continue parsing a file we already decided is invalid... after that we have three scenarios:
2a. Maintain the current behavior, we catch the exception when we use file_validation, like
try: file_datum = file_validation(outfile) ; except ExceptionWeWillRise: return None
2b. We don't catch the exception, so instead of returning None when an AMBER output file was not read we let the file_validation interrupt the flow of the program and let the user be aware that a file was not parsed.
2c. An hybrid version of 3a and 3b, we catch the exception from file_validation inside the extract_* functions, and raise here a more informative exception, maybe printing both a general warning here and also the warning from the exception risen inside file_validation
change the name of the function file_validation to something more coherent to its behavior, I'm proposing file_initialiation, but I'm open to any suggestion.

I would certainly want to do 1., probably want to do 3. and I propend towards 2b. as it will let the users be aware that a file is not read, letting them be able to just catch the exception if they want to continue parsing more files.

Depending on the route we'll choose, also the tests for invalid files in test_amber.py will have to be updated, but this is fine because it addresses also issue #234 in a single PR (I could first address #234 to separate the common single-test in one test for each case of invalid file in the current implementation, and then change the tests accordingly to the changes made here, but it would be a waste of time I think, without a proper gain in traceability). I think the only advantage would be that if we revert this specific PR, we'll still have the separate tests if this was done in two separate PRs).

Please, let me know what you think!

The text was updated successfully, but these errors were encountered:

xiki-tempula · 2022-09-21T16:11:58Z

Thanks for posting this issue.
I will have a more close read of this when I have time. I have two quick questions.

Will the new implementation change the existing API? Like extract_dHdl/u_nk. I suppose no.
Will the new implementation speeds the parsing up?

Will probably have more questions when I have a more closer read.

DrDomenicoMarson · 2022-09-21T16:56:22Z

Regarding the first two questions:

Will the new implementation change the existing API? Like extract_dHdl/u_nk. I suppose no.

You suppose right, the only difference for the user is that (in the case of 2b) if he/she is trying to read an invalid file he gets an exception risen instead of a None. For an experienced user that has some automated pipeline, this is easily addressed with a try/except block, while for a novice I think it's better to get an error if a file is not parsed correctly.

Will the new implementation speeds the parsing up?

Ideally yes, but I don't think this will be a drastic change in execution time. The bigger advantage for me is to promptly notify to the user when the file he/she is trying to read is invalid.

xiki-tempula · 2022-09-25T10:04:46Z

This function, which should return a "file parsed object FEData", whenever it encounters some problem reading the file, just prints a descriptive WARNING and switches an internal variable invalid to True, then continues reading the file as it nothing happened, only to return False if invalid was True instead of returning the parsed (and invalid) object.

This looks weird to me as well. I suspect that this is from some legacy code and the file_validation is a wrapper around the old code.

remove the redundant file_Validation call (this I think must be done 😆 )

Agree

in file_validation, as soon as problems that now make the function return False, instead of switching invalid=True, raise a proper Exception (maintaining the current output warning). In this way, we won't continue parsing a file we already decided is invalid... after that we have three scenarios:
2b. We don't catch the exception, so instead of returning None when an AMBER output file was not read we let the file_validation interrupt the flow of the program and let the user be aware that a file was not parsed.
2c. An hybrid version of 3a and 3b, we catch the exception from file_validation inside the extract_* functions, and raise here a more informative exception, maybe printing both a general warning here and also the warning from the exception risen inside file_validation

2b is good enough for me but if it is done in a better sense, 2c would be better.

change the name of the function file_validation to something more coherent to its behavior, I'm proposing file_initialiation, but I'm open to any suggestion.

Agree

DrDomenicoMarson · 2022-09-25T10:52:38Z

Great, I admit all this section of the AMBER parser really bothers me somehow, so I'm really happy to change it! I'll address this issue when PR #240 is done, and maybe also when the PR with the new test is done in alchemtest, because here lot's of tests will have to be changed to reflect the modified behaviour (all the tests of invalid_files at least)!

orbeckst · 2022-10-19T17:44:20Z

Note that PR #253 addresses some of the issues.

orbeckst · 2022-10-19T17:47:45Z

General comment: I am in favor of raising an exception at the first problem — there's no point continuing if essential data are missing.

Thank you for taking on the task of cleaning up the AMBER parser.

* fix #238 Now raise exceptions for *any* problems in AMBER input files instead of issuing a warning and continuing and then perhaps failing at the end. * fix #227 All problems in AMBER parser are now properly logged. * removed double run of `file_validation` * added docstring to `file_validation` * update CHANGELOG * added tests for the logging and the new exceptions Co-authored-by: Oliver Beckstein <[email protected]>

xiki-tempula added testing AMBER MD engine labels Sep 25, 2022

DrDomenicoMarson mentioned this issue Oct 19, 2022

Use new testfiles and increase coverage to 100% for amber parser #253

Merged

orbeckst linked a pull request Oct 21, 2022 that will close this issue

Raise except instead of continuing #256

Merged

orbeckst mentioned this issue Oct 21, 2022

Raise except instead of continuing #256

Merged

orbeckst closed this as completed in #256 Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

DrDomenicoMarson commented Sep 21, 2022

xiki-tempula commented Sep 21, 2022

DrDomenicoMarson commented Sep 21, 2022

xiki-tempula commented Sep 25, 2022

DrDomenicoMarson commented Sep 25, 2022

orbeckst commented Oct 19, 2022

orbeckst commented Oct 19, 2022

AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

Comments

DrDomenicoMarson commented Sep 21, 2022

xiki-tempula commented Sep 21, 2022

DrDomenicoMarson commented Sep 21, 2022

xiki-tempula commented Sep 25, 2022

DrDomenicoMarson commented Sep 25, 2022

orbeckst commented Oct 19, 2022

orbeckst commented Oct 19, 2022