Skip to content
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
61758c3
perf: bump Salmon to v1.10.1
Austin-s-h Mar 25, 2023
8eb7a1d
fix: reorder to prioritize bioconda
Austin-s-h Mar 25, 2023
60f066f
Merge branch 'snakemake:master' into master
Austin-s-h Mar 27, 2023
1c99871
fix: Allow for large assets in refgenie
Austin-s-h Mar 27, 2023
d0dcc61
fix: read-only conf acces for concurrent rules dl
Austin-s-h Mar 27, 2023
4d3758c
fix: handle refgenconf write lock error
Austin-s-h Mar 28, 2023
562df5f
Merge pull request #1 from snakemake/master
Austin-s-h Mar 28, 2023
c074c68
doc: Remove comments
Austin-s-h Mar 28, 2023
ee3bad5
fix: formatting refgenie
Austin-s-h Mar 28, 2023
fce4bb0
feat: add logs to refgenie rules
Austin-s-h Mar 28, 2023
73e3a02
fix: remove input string from rsem wrapper
Austin-s-h Mar 31, 2023
a3b02b8
perf: reorganize inputs and add back input_string
Austin-s-h Mar 31, 2023
0d10e6a
fix: switch to Path in order to resolve basename more robustly
Austin-s-h Mar 31, 2023
fe7b05a
fix: force string in reference_prefix
Austin-s-h Mar 31, 2023
c2230b2
Merge pull request #2 from snakemake/master
Austin-s-h Apr 3, 2023
b3d832b
Merge branch 'snakemake:master' into master
Austin-s-h Apr 11, 2023
5078fa2
Merge pull request #3 from snakemake/master
Austin-s-h Apr 17, 2023
b927c4f
chore: release 1.26.0
github-actions[bot] Apr 17, 2023
6b3d6e3
Merge pull request #4 from sansterbioanalytics/release-v1.26.0
Austin-s-h Apr 17, 2023
64bd5e6
ci: update workflows to self-hosted
Austin-s-h Apr 17, 2023
decd1a5
perf: resolve conflict and adjust force_large to param
Austin-s-h Feb 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions bio/refgenie/test/Snakefile
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
rule obtain_asset:
output:
# the name refers to the refgenie seek key (see attributes on http://refgenomes.databio.org)
fai="refs/genome.fasta"
fai="refs/genome.fasta",
# Multiple outputs/seek keys are possible here.
params:
genome="human_alu",
asset="fasta",
tag="default"
tag="default",
log:
"logs/refgenie/obtain_large_asset.log",
wrapper:
"master/bio/refgenie"

rule obtain_large_asset:
output:
star_index=directory("refs/star_index/hg38/star_index"),
params:
genome="hg38",
asset="star_index",
tag="default",
log:
"logs/refgenie/obtain_large_asset.log",
wrapper:
"master/bio/refgenie"
16 changes: 13 additions & 3 deletions bio/refgenie/wrapper.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,27 @@

import os
import refgenconf
from refgenconf.exceptions import RefgenconfError

genome = snakemake.params.genome
asset = snakemake.params.asset
tag = snakemake.params.tag

conf_path = os.environ["REFGENIE"]

rgc = refgenconf.RefGenConf(conf_path, writable=True)

# BUG If there are multiple concurrent refgenie commands, this will fail due to
# unable to acquire lock of the config file.
try:
rgc = refgenconf.RefGenConf(conf_path, writable=True)
except RefgenconfError:
# If read lock timeout, attempt to skip the read lock
rgc = refgenconf.RefGenConf(
conf_path, writable=True, skip_read_lock=True, genome_exact=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't really find out what the exact implications of skip_read_lock=TRUE are, but it seems dangerous to use, to me. Have you also tried increasing wait_max= as an alternative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't attempt to, but I suspect that this might not be a great choice either. If someone is downloading an asset over a slow connection, even setting wait_max from its default of 60 to 600 might not make a difference and result in a hard-to-diagnose timeout error.

I'm not sure if this was some sort of conflict with the snakemake locking system as well. If we rely on that to protect other files, then the result of the wrapper is it either produces the output file, or the rule fails with a RefgenconfError error and recommends setting the skip_read_lock=TRUE param to try to fix the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I gathered by poking around a little, I think that the lock only happens while something is written to the conf file. So I would think that this lock is not in place the whole time you are doing the download and that the wait_max= should already help. But the documentation on this is not very clear and I didn't immediately find the mechanism in the code, so I might be misunderstanding this lock.

Do you have the possibility to try wait_max= in your use case and test whether this actually helps?

)
# pull asset if necessary
gat, archive_data, server_url = rgc.pull(genome, asset, tag, force=False)
gat, archive_data, server_url = rgc.pull(
genome, asset, tag, force=False, force_large=True
)

for seek_key, out in snakemake.output.items():
path = rgc.seek(genome, asset, tag_name=tag, seek_key=seek_key, strict_exists=True)
Expand Down