-
Notifications
You must be signed in to change notification settings - Fork 485
Busco (5.8.0) extra test to indicate bug #7209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @bgruening , that should be what I wrote in #6664 . |
|
Oh, true, no need to run it again, the issues says it all. |
|
What are the names of the folders matching |
|
in the case I reported it's the following:
where in the case there is only 1 (top level) dir it's only the linked dir that is present |
|
Can you ask upstream which of the two should be used? Also the concatenations happening at the end of the tool's command block seem questionable in this context. |
|
what do you mean? if you can tell prior to running it (or seeing the output) which of the dirs you will need? Indeed, have been looking at the cat part too ... I'm currently looking into if I can 'solve' it in the command block of the wrapper. |
The question would be why there may be sometimes multiple output dirs (resp symlinks). Shouldnt there be only one also in the auto-lineage case? |
|
ah, in that sense ... yeah, good point but don't know tbh Are you hinting to the fact we should (could?) bring this to the busco people first before implementing a fix in galaxy? |
|
been quickly browsing through their doc and it this seems to be intended behaviour: It will be come down to handing this in the wrapper it seems (might be doable in the command block?) |
|
what if we add something as follows to the command block (after run but before the concatenation step) : that will remove the link to dir and only in the case when there are multiple dirs present. this of course makes the assumption that only the result of the most specific clade is reported, which in most (all?) cases is a valid assumption as that is exactly what one wants to achieve with the auto-lineage parameter. |
|
adding this: to the command block in the wrapper seems to pass both tests (including the failed one on multiple dirs). |
Not sure about the docs:
So far so good :)
Then why do we have folder + symlink? In which case are we?
This we also need to keep in mind. |
ah, in the case where the placement could be done and a more specific busco was found.
Those are indeed present in the top folder but are, as far as I can see, not used in the wrapper for collecting the output. The wrapper gets the final output from the subdirs directly and not from those top folder files. |
|
I assume it's these parts of the wrapper (in the output section) that causes the issue: those |
|
Thanks for sharing the issue! Agreed with you last comment, it is likely the issue. I think it would be great to keep all the output as some lineages are not very good (e.g. mollusca) and so sometimes we refer to Eukaryota even if we have a more specific one available. If we were to rename the files after the analysis, for example : Do you think that would be a viable solution? Is the file you are using and that causes the crash publicly available? |
We can keep all outputs. Then my suggestion would be to replace all I'm just afraid that this can not be used easily in automatic workflows anymore. Before we had a single static output (which can be connected to downstream tools) and after this we have dynamic outputs (even if it will be in most cases only one) that are only known after the tool finished .. this will be tricky or impossible to use in workflows. Maybe we can keep the static output and add dynamic ones for the cases where needed. But there needs to be some logic to determine which folder/file is the primary one.
Guess so.
Yes. The test description is here and the test fasta here. I could also lookup the command line in the CI if needed. |
Not sure about what you mean by
BUSCO outputs are usually one of the final output of a workflow, BUSCO outputs are rarely used as input files in a workflow, I think only multiQC (and maybe Blobtools) could use BUSCO output. But yes, we should still be careful about not breaking workflows.
Yes! What about something like What do you think?
I can try to code this and ping you for review? |
Sounds feasible
Excellent :) |
Valid point indeed! and I already felt a bit "annoyed' to have to remove info/results
I'm a bit lost to be honest. Anyway, eager to see what you can throw together code-wise :-) |
|
Hi! |
|
Sounds like a good idea to me indeed. I assume the more admin-like people (@bgruening , @bernt-matthias ) have the best view on this, but for me it's a go to include it in your PR. |
|
The changes were merged into #7220. |
|
all resolved in #7220 |
FOR CONTRIBUTOR:
this is linked to #6664
the stderr of the planemo test run indicates what the problem is:
Not sure how to fix this though.
Perhaps it should become a discover_dataset kinda output capture to be able to get data from multiple output folders?