Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMBER: are we too permissive in reading the files? (and also we are doing a double-reading? #238

Closed
DrDomenicoMarson opened this issue Sep 21, 2022 · 6 comments · Fixed by #256
Labels

Comments

@DrDomenicoMarson
Copy link
Contributor

Here I want to tackle an issue that may change the behavior of some existing scripts, but I think for the better.

Currently, in the AMBER parser, we have a function called file_validation (the name is pretty misleading because indeed it validates the file, but more importantly it returns either False or an object representation of the input AMBER file, on which then the other functions operate).
This function, which should return a "file parsed object FEData", whenever it encounters some problem reading the file, just prints a descriptive WARNING and switches an internal variable invalid to True, then continues reading the file as it nothing happened, only to return False if invalid was True instead of returning the parsed (and invalid) object.
This function is called when extracting dHdl and u_nk, and here is funny, because in both cases we do:

file_datum = file_validation(outfile)
if not file_validation(outfile):  # pragma: no cover
    return None

which in itself is redundant, as it was sufficient to check if not file_datum: return None without parsing the same file twice.

The flow right now then is that we read and parse the AMBER output file to create the FEData object, if the file it's ok we continue parsing the AMBER out to get the data we need, otherwise, both extract_* functions return None.

I'm proposing some changes here (which have not to be implemented altogether)

  1. remove the redundant file_Validation call (this I think must be done 😆 )
  2. in file_validation, as soon as problems that now make the function return False, instead of switching invalid=True, raise a proper Exception (maintaining the current output warning). In this way, we won't continue parsing a file we already decided is invalid... after that we have three scenarios:
    2a. Maintain the current behavior, we catch the exception when we use file_validation, like
    try: file_datum = file_validation(outfile) ; except ExceptionWeWillRise: return None
    2b. We don't catch the exception, so instead of returning None when an AMBER output file was not read we let the file_validation interrupt the flow of the program and let the user be aware that a file was not parsed.
    2c. An hybrid version of 3a and 3b, we catch the exception from file_validation inside the extract_* functions, and raise here a more informative exception, maybe printing both a general warning here and also the warning from the exception risen inside file_validation
  3. change the name of the function file_validation to something more coherent to its behavior, I'm proposing file_initialiation, but I'm open to any suggestion.

I would certainly want to do 1., probably want to do 3. and I propend towards 2b. as it will let the users be aware that a file is not read, letting them be able to just catch the exception if they want to continue parsing more files.

Depending on the route we'll choose, also the tests for invalid files in test_amber.py will have to be updated, but this is fine because it addresses also issue #234 in a single PR (I could first address #234 to separate the common single-test in one test for each case of invalid file in the current implementation, and then change the tests accordingly to the changes made here, but it would be a waste of time I think, without a proper gain in traceability). I think the only advantage would be that if we revert this specific PR, we'll still have the separate tests if this was done in two separate PRs).

Please, let me know what you think!

@xiki-tempula
Copy link
Collaborator

Thanks for posting this issue.
I will have a more close read of this when I have time. I have two quick questions.

  • Will the new implementation change the existing API? Like extract_dHdl/u_nk. I suppose no.
  • Will the new implementation speeds the parsing up?

Will probably have more questions when I have a more closer read.

@DrDomenicoMarson
Copy link
Contributor Author

Regarding the first two questions:

  • Will the new implementation change the existing API? Like extract_dHdl/u_nk. I suppose no.

You suppose right, the only difference for the user is that (in the case of 2b) if he/she is trying to read an invalid file he gets an exception risen instead of a None. For an experienced user that has some automated pipeline, this is easily addressed with a try/except block, while for a novice I think it's better to get an error if a file is not parsed correctly.

  • Will the new implementation speeds the parsing up?

Ideally yes, but I don't think this will be a drastic change in execution time. The bigger advantage for me is to promptly notify to the user when the file he/she is trying to read is invalid.

@xiki-tempula
Copy link
Collaborator

This function, which should return a "file parsed object FEData", whenever it encounters some problem reading the file, just prints a descriptive WARNING and switches an internal variable invalid to True, then continues reading the file as it nothing happened, only to return False if invalid was True instead of returning the parsed (and invalid) object.

This looks weird to me as well. I suspect that this is from some legacy code and the file_validation is a wrapper around the old code.

remove the redundant file_Validation call (this I think must be done 😆 )

Agree

in file_validation, as soon as problems that now make the function return False, instead of switching invalid=True, raise a proper Exception (maintaining the current output warning). In this way, we won't continue parsing a file we already decided is invalid... after that we have three scenarios:
2b. We don't catch the exception, so instead of returning None when an AMBER output file was not read we let the file_validation interrupt the flow of the program and let the user be aware that a file was not parsed.
2c. An hybrid version of 3a and 3b, we catch the exception from file_validation inside the extract_* functions, and raise here a more informative exception, maybe printing both a general warning here and also the warning from the exception risen inside file_validation

2b is good enough for me but if it is done in a better sense, 2c would be better.

change the name of the function file_validation to something more coherent to its behavior, I'm proposing file_initialiation, but I'm open to any suggestion.

Agree

@DrDomenicoMarson
Copy link
Contributor Author

Great, I admit all this section of the AMBER parser really bothers me somehow, so I'm really happy to change it! I'll address this issue when PR #240 is done, and maybe also when the PR with the new test is done in alchemtest, because here lot's of tests will have to be changed to reflect the modified behaviour (all the tests of invalid_files at least)!

@orbeckst
Copy link
Member

Note that PR #253 addresses some of the issues.

@orbeckst
Copy link
Member

General comment: I am in favor of raising an exception at the first problem — there's no point continuing if essential data are missing.

Thank you for taking on the task of cleaning up the AMBER parser.

@orbeckst orbeckst linked a pull request Oct 21, 2022 that will close this issue
orbeckst added a commit that referenced this issue Oct 25, 2022
* fix #238 
   Now raise exceptions for *any* problems in AMBER input files instead of issuing a warning and
   continuing and then perhaps failing at the end.
* fix #227 
   All problems in AMBER parser are now properly logged.
* removed double run of `file_validation` 
* added docstring to `file_validation`
* update CHANGELOG
* added tests for the logging and the new exceptions

Co-authored-by: Oliver Beckstein <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants