Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DataToolParameters accepting input with arbitrary datatype in tests #12073

Draft
wants to merge 24 commits into
base: dev
Choose a base branch
from

Conversation

bernt-matthias
Copy link
Contributor

@bernt-matthias bernt-matthias commented May 29, 2021

In the first place this PR added a test case (test/functional/tools/bam_input_for_sam.xml) showing that in tests (e.g. with planemo test) data parameters of tools accept input of arbitrary datatypes. The test shows this for a sam input which gets bam and bed in the two tests (the bam case might be interpreted as a case of a missing converter as @mvdbeek pointed out below).

Note that in the UI this works as expected, i.e. only sam is accepted.

The main fix is in lines:

match = dataset_matcher.hda_match(v, ensure_visible=False)
if match:
if match.implicit_conversion:
v.implicit_conversion = True
else:
raise ParameterValueError("Data set with invalid datatype supplied to input dataset parameter", self.name)

Previously the loop containing this code checked only if an implicit conversion is possible/necessary for one/more inputs. Now an exception is raised if the datasets do not match the datatype required by the parameter.
An additional necessary change is that dataset_matcher.hda_match now uses ensure_visible=False. The consequence is that also hidden datasets are considered. This was necessary since otherwise collection input does not get a match if the contained data sets are hidden (makes me wonder if implicit conversion ever worked for this case .. edit: seems so, but I don't understand why). Not sure about other possible side effects.

This required a quite few changes to tests where datatypes of inputs were not specified or the wrong datatype was shown. While fixing I realized that at different places ext, extension, and file_type are used .. would be nice if this would become consistent...

TODO

  • the same in necessary for DataCollectionToolParameter, somewhere here
    def from_json(self, value, trans, other_values=None):
    • add a test
    • add fix
  • adapt Exception text. Since a mismatch can alslo happen if an options filter kicks out datasets with wrong / missing dbkey

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these contributions under Galaxy's current license.
  • I agree to allow the Galaxy committers to license these and all my past contributions to the core galaxy codebase under the MIT license. If this condition is an issue, uncheck and just let us know why with an e-mail to [email protected].

@github-actions github-actions bot added this to the 21.09 milestone May 29, 2021
@mvdbeek
Copy link
Member

mvdbeek commented May 31, 2021

sam should accept bam, and we already have tests for that in sam_to_unsorted_bam.xml. The UI should be accepting bam, can you let me know how to reproduce that ?

@bernt-matthias
Copy link
Contributor Author

I tried with planemo serve.

sam should accept bam

But then a converter should run, or? You can see in this test that no converter runs.

@bernt-matthias
Copy link
Contributor Author

I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.

@mvdbeek
Copy link
Member

mvdbeek commented May 31, 2021

I don't see how https://github.com/galaxyproject/galaxy/blob/dev/test/functional/tools/sam_to_unsorted_bam.xml covers this. All tests upload sam.

You're right, we only have tests in the other direction and we don't have a bam to sam converter, oddly enough ... not sure how that is possible

@bernt-matthias
Copy link
Contributor Author

bernt-matthias commented May 31, 2021

Seems to me that there are two things to do:

  • find out why the test runs and does not complain about datatype mismatch
  • add a converter

I could need a hint for the first point. Converter seems easy.

@mvdbeek
Copy link
Member

mvdbeek commented May 31, 2021

  • find out why the test runs and does not complain about datatype mismatch

Maybe try targeting a running instance with the test ... if that fails maybe we're setting up some weird development variables in the testing framework ?

@bernt-matthias
Copy link
Contributor Author

I started a Galaxy instance with GALAXY_RUN_WITH_TEST_TOOLS=1 ./run.sh and executed the test with galaxy-tool-test -t bam4sam --galaxy-url http://localhost:8080 --key APIKEY

Has exactly the same result:

Executing test 'bam4sam/1.0-0'
Test 'bam4sam/1.0-0' failed
Traceback (most recent call last):
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/script.py", line 281, in run_test
    verify_tool(
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1050, in verify_tool
    raise e
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1046, in verify_tool
    job_stdio = _verify_outputs(testdef, test_history, jobs, tool_id, data_list, data_collection_list, galaxy_interactor, quiet=quiet)
  File "/home/berntm/projects/galaxy/.venv/lib/python3.8/site-packages/galaxy/tool_util/verify/interactor.py", line 1222, in _verify_outputs
    raise JobOutputsError(found_exceptions, job_stdio)
galaxy.tool_util.verify.interactor.JobOutputsError: Output outfile:  different than expected
Output file did not contain expected text 'sam' (output 'unsorted.bam
')
Report written to 'standard output'
Passed tool tests (0): []
Failed tool tests (1): ['bam4sam/1.0-0']
Skipped tool tests (0): []
Errored tool tests (0): []
Total tool test time: 0:00:21.492503

Any suggestion where to look, e.g. where the check for the input data types happens?

@bernt-matthias
Copy link
Contributor Author

Tried to find out where the input datatype is checked .. without success. I also tried to set a nonsense ftype="bed". This apparently leads to an infinite loop.

@bernt-matthias bernt-matthias force-pushed the topic/sam-input-accepts-bam branch 2 times, most recently from 8ee0728 to 7629054 Compare June 28, 2021 15:40
@bernt-matthias
Copy link
Contributor Author

bernt-matthias commented Jun 28, 2021

@mvdbeek do you think this (8285bd6) goes in a useful direction. Seems that it makes quite a lot of tests fail (a fix for one of them is here: 7629054).

@mvdbeek mvdbeek modified the milestones: 21.09, 22.01 Sep 13, 2021
@bernt-matthias bernt-matthias force-pushed the topic/sam-input-accepts-bam branch from 7629054 to 4f1e376 Compare October 6, 2021 15:36
@bernt-matthias bernt-matthias changed the title add a test case that shows that sam input accepts bam Add a test case that shows that sam input accepts bam Oct 6, 2021
@bernt-matthias bernt-matthias changed the title Add a test case that shows that sam input accepts bam Fix DataToolParameters accepting input with arbitrary datatype in tests Oct 7, 2021
@bernt-matthias
Copy link
Contributor Author

Nearly there (hopefully): only 1 selenium and 1 api test failing.

I also ran this on the IUC repo: galaxyproject/tools-iuc#4015 (comment): 291 failures + 3 errors. Uff. I checked a few of them .. seems that they are caused by tests that miss the ftype attribute in input params and the sniffer fails. Seems legit .. but also quite some work to fix this.

The planemo html output seems quite useless in these cases (no output at all) and one needs to check the console output which contains something like this

======================================================================
ERROR: ( aegean_locuspocus ) > Test-1
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 98, in test_tool
    self.do_it(tool_version=tool_version, test_index=test_index)
  File "/tmp/tmpgh3suwnl/galaxy-dev/test/functional/test_toolbox.py", line 35, in do_it
    verify_tool(tool_id, self.galaxy_interactor, resource_parameters=resource_parameters, test_index=test_index, tool_version=tool_version, register_job_data=register_job_data)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1110, in verify_tool
    raise e
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 1103, in verify_tool
    tool_response = galaxy_interactor.run_tool(testdef, test_history, resource_parameters=resource_parameters)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 468, in run_tool
    submit_response_object = ensure_tool_run_response_okay(submit_response, "execute tool", inputs_tree)
  File "/tmp/tmpgh3suwnl/galaxy-dev/lib/galaxy/tool_util/verify/interactor.py", line 811, in ensure_tool_run_response_okay
    raise RunToolException(message, inputs, dynamic_param_error=dynamic_param_error)
galaxy.tool_util.verify.interactor.RunToolException: parameter 'genesgff3': Data set with invalid datatype supplied to input dataset parameter
...

Would be helpful to have this in the html output.

@bernt-matthias
Copy link
Contributor Author

and the sniffer fails

we might also use the opportunity and try to improve the sniffers.

bernt-matthias added a commit to bernt-matthias/tools-iuc that referenced this pull request Oct 8, 2021
test data is sniffed as gff (because it really is gff2) and therefore can not be used as input which needs to be gff2

forcing gff3 by setting ftype since the aegan website insists that gff3 is
used (so allowing gff seems not an option)

xref galaxyproject/galaxy#12073
bernt-matthias added a commit to bernt-matthias/tools-iuc that referenced this pull request Oct 8, 2021
dbkey needs to be set since the dynamic options filter otherwise
removes the dataset here

https://github.com/galaxyproject/galaxy/blob/5ea4d7e647d09dde957e824a083b1bd6acaa6f17/lib/galaxy/tools/parameters/basic.py#L1961
https://github.com/galaxyproject/galaxy/blob/5ea4d7e647d09dde957e824a083b1bd6acaa6f17/lib/galaxy/tools/parameters/dataset_matcher.py#L156

Currently this only leads to not considering the dataset for implicit conversion.

With galaxyproject/galaxy#12073 this makes the
tool fail.
bernt-matthias and others added 21 commits October 7, 2022 13:56
otherwise if a tool is mapped over a collection where the
included datasets are hidden does not work.
I guess also implicit conversion did not work so far for this case.

Not sure about possible side effects. I tested manually
that the tool form does not include hidden datasets
and potentially one selenium test
seems to fix api/selenium problems
output does not matter anyway because of expect_failure
Co-authored-by: Marius van den Beek <[email protected]>
by using content instead of value
as suggested here galaxyproject#12073 (comment)

and distinguish HDCA and DCE
This reverts commit a1625e2.
@nsoranzo
Copy link
Member

My suggestion for the upload problem would be to deprecate the display_in_upload attribute of the datatypes (inspired by discussion with @nsoranzo during EGD22 .. If I got him right :). For the following reasons:

  • the datatypes with a sniffer and display_in_upload="false" can be uploaded (tested for the cm datatype)

  • datatypes can be changed also to unuploadable datatypes, i.e. datatypes without a sniffer and display_in_upload="false" can be uploaded if an uploadable datatype/auto is chosen in the upload dialog and the datatype is manually changed to the desired (unoploadable) datatype

So display_in_upload seems a bit useless.

Summary of the discussion at the latest Backend WG meeting: it was agreed that display_in_upload is not very useful currently, but some were still concerned about datatypes that we shouldn't allow users to upload. The suggested way forward is:

  • Set display_in_upload="true" for all datatypes needed as input for tool testing that are deemed safe
  • For the datatypes that are not considered safe, this should be documented with a comment in lib/galaxy/config/sample/datatypes_conf.xml.sample
  • Improve checks for unsafe datatypes done during upload and metadata editing of datasets, and extend test coverage

I hope I've not forgot/misunderstood important details from the meeting, feel free to correct!

bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Dec 11, 2022
Like in the filter for selects with dynamic options.
Note actually in the select case we check if the referred
dataset has the dbkey set while we do this here for the
dataset itself. I guess this could be changed but I could
not find a good way to do this.

Discovered while fixing galaxyproject/tools-iuc#4023
for galaxyproject#12073

where this leads to a bug because filtered datasets are not matching:
https://github.com/galaxyproject/galaxy/blob/5ea4d7e647d09dde957e824a083b1bd6acaa6f17/lib/galaxy/tools/parameters/dataset_matcher.py#L132
@dannon dannon modified the milestones: 23.0, 23.1 Jan 10, 2023
bernt-matthias added a commit to bernt-matthias/tools-iuc that referenced this pull request Jan 10, 2023
there is no scool datatype in Galaxy. scool is currently sniffed as h5
which leads to an error with galaxyproject/galaxy#12073

possible solutions:

- replace scool by h5 and lower the profile
- add scool dataype to Galaxy and set the profile to the Galaxy version
  where it was added
bernt-matthias added a commit to galaxyproject/tools-iuc that referenced this pull request Apr 14, 2023
* schicexplorer: missing scool datatype

there is no scool datatype in Galaxy. scool is currently sniffed as h5
which leads to an error with galaxyproject/galaxy#12073

possible solutions:

- replace scool by h5 and lower the profile
- add scool dataype to Galaxy and set the profile to the Galaxy version
  where it was added

* rename version token

* schicexplorer: bump profile

scool datattype became available in 22.05 galaxyproject/galaxy#14480

* schicexplorer fixes:

scHicQualityControl: cp input file
scHicAdjustMatrix: fix test

* remove unused parameter
@mvdbeek mvdbeek removed this from the 23.1 milestone Jun 21, 2023
bernt-matthias added a commit to bernt-matthias/galaxy that referenced this pull request Mar 10, 2024
Like in the filter for selects with dynamic options.
Note actually in the select case we check if the referred
dataset has the dbkey set while we do this here for the
dataset itself. I guess this could be changed but I could
not find a good way to do this.

Discovered while fixing galaxyproject/tools-iuc#4023
for galaxyproject#12073

where this leads to a bug because filtered datasets are not matching:
https://github.com/galaxyproject/galaxy/blob/5ea4d7e647d09dde957e824a083b1bd6acaa6f17/lib/galaxy/tools/parameters/dataset_matcher.py#L132
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants