-
Notifications
You must be signed in to change notification settings - Fork 201
fix: Allow concurrent and large refgenie downloads #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Austin-s-h
wants to merge
21
commits into
snakemake:master
Choose a base branch
from
sansterbioanalytics:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
61758c3
perf: bump Salmon to v1.10.1
Austin-s-h 8eb7a1d
fix: reorder to prioritize bioconda
Austin-s-h 60f066f
Merge branch 'snakemake:master' into master
Austin-s-h 1c99871
fix: Allow for large assets in refgenie
Austin-s-h d0dcc61
fix: read-only conf acces for concurrent rules dl
Austin-s-h 4d3758c
fix: handle refgenconf write lock error
Austin-s-h 562df5f
Merge pull request #1 from snakemake/master
Austin-s-h c074c68
doc: Remove comments
Austin-s-h ee3bad5
fix: formatting refgenie
Austin-s-h fce4bb0
feat: add logs to refgenie rules
Austin-s-h 73e3a02
fix: remove input string from rsem wrapper
Austin-s-h a3b02b8
perf: reorganize inputs and add back input_string
Austin-s-h 0d10e6a
fix: switch to Path in order to resolve basename more robustly
Austin-s-h fe7b05a
fix: force string in reference_prefix
Austin-s-h c2230b2
Merge pull request #2 from snakemake/master
Austin-s-h b3d832b
Merge branch 'snakemake:master' into master
Austin-s-h 5078fa2
Merge pull request #3 from snakemake/master
Austin-s-h b927c4f
chore: release 1.26.0
github-actions[bot] 6b3d6e3
Merge pull request #4 from sansterbioanalytics/release-v1.26.0
Austin-s-h 64bd5e6
ci: update workflows to self-hosted
Austin-s-h decd1a5
perf: resolve conflict and adjust force_large to param
Austin-s-h File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,11 +1,25 @@ | ||
| rule obtain_asset: | ||
| output: | ||
| # the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org) | ||
| fai="refs/genome.fasta" | ||
| fai="refs/genome.fasta", | ||
| # Multiple outputs/seek keys are possible here. | ||
| params: | ||
| genome="human_alu", | ||
| asset="fasta", | ||
| tag="default" | ||
| tag="default", | ||
| log: | ||
| "logs/refgenie/obtain_large_asset.log", | ||
| wrapper: | ||
| "master/bio/refgenie" | ||
|
|
||
| rule obtain_large_asset: | ||
| output: | ||
| star_index=directory("refs/star_index/hg38/star_index"), | ||
| params: | ||
| genome="hg38", | ||
| asset="star_index", | ||
| tag="default", | ||
| log: | ||
| "logs/refgenie/obtain_large_asset.log", | ||
| wrapper: | ||
| "master/bio/refgenie" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't really find out what the exact implications of
skip_read_lock=TRUEare, but it seems dangerous to use, to me. Have you also tried increasingwait_max=as an alternative?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't attempt to, but I suspect that this might not be a great choice either. If someone is downloading an asset over a slow connection, even setting wait_max from its default of 60 to 600 might not make a difference and result in a hard-to-diagnose timeout error.
I'm not sure if this was some sort of conflict with the snakemake locking system as well. If we rely on that to protect other files, then the result of the wrapper is it either produces the output file, or the rule fails with a RefgenconfError error and recommends setting the skip_read_lock=TRUE param to try to fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I gathered by poking around a little, I think that the lock only happens while something is written to the conf file. So I would think that this lock is not in place the whole time you are doing the download and that the
wait_max=should already help. But the documentation on this is not very clear and I didn't immediately find the mechanism in the code, so I might be misunderstanding this lock.Do you have the possibility to try
wait_max=in your use case and test whether this actually helps?