Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compleasm: use symlink instead of copying busco data #6679

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

abretaud
Copy link
Contributor

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

I looked at the compleasm code, and found this way to avoid copying the busco dataset (which can take time and make compleasm jobs longer than busco jobs)

cp -r '${busco_database.fields.path}/lineages/${lineage_dataset}/' 'galaxy_db/' &&
mkdir -p 'galaxy_db/' &&
ln -s '${busco_database.fields.path}/lineages/${lineage_dataset}/' 'galaxy_db/${lineage_dataset}' &&
touch 'galaxy_db/${lineage_dataset}.done' &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an empty file also existing in the DB folder? Then I would prefer a symlink.

Anyway a small explaining comment would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just an empty file specific to compleasm, to make it understand that it should not redownload it, I'm gonna add a comment

@@ -2,5 +2,5 @@
# - value
# - name
# - version
# - /path/to/data
eukaryota_odb10 eukaryota 5.4.6 ${__HERE__}/test-db/busco_downloads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why you updated the test (data)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to keep the test-data dir as small as possible as entomoplasmatales_odb10 is a much smaller lineage than eukaryota_odb10

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but changing it will increase the size of the repo. The "problem" with git repos is that everything that is in in will be there forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But changing it might still be a good idea if the runtime is reduced significantly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants