Skip to content

vep download cache: Add cache url #366

@lczech

Description

@lczech

Is your feature request related to a problem? Please describe.
The documentation of VEP makes it not very obvious how to use genomes that are not Homo sapiens, and it was hard to figure out why my attempts to get VEP to run on a plant species failed. Finally, I figured that one needs to specify a specific (not easy to find) FTP URL from where to download the vep data to the script so that the data can be found.

Hence, I suggest to add this capability to the vep download cache wrapper, and maybe document a bit better how one can select different genomes. Same for the fasta URL, if the user decides to download that data as well - which will however then trigger issue 365, but this is solved in my suggested code below as well.

Describe the solution you'd like
Something like:

from pathlib import Path
from snakemake.shell import shell

# Get params. By default, we run only cache (--AUTO c), unlike the original wrapper,
# which also requested fasta (--AUTO cf), which would then mess up the check 
# in the vep annotation wrapper that the subdirectory of the cache contains a single directory.
# See https://github.com/snakemake/snakemake-wrappers/issues/365
automode = snakemake.params.get("automode", "c")
extra = snakemake.params.get("extra", "")

# Extra optional cache and fasta url
cacheurl = snakemake.params.get("cacheurl", "")
if cacheurl:
    cacheurl = "--CACHEURL \"{}\"".format(cacheurl)
fastaurl = snakemake.params.get("fastaurl", "")
if fastaurl:
    fastaurl = "--FASTAURL \"{}\"".format(fastaurl)

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Compared to the original wrapper, we add the two urls, and also use a newer version
# of vep install, which uses --CACHE_VERSION instead of --VERSION.
# This requires to change the environment to use vep 104.
shell(
    "vep_install --AUTO {automode} "
    "--SPECIES {snakemake.params.species} "
    "--ASSEMBLY {snakemake.params.build} "
    "--CACHE_VERSION {snakemake.params.release} "
    "--CACHEDIR {snakemake.output} "
    "--CONVERT "
    "--NO_UPDATE "
    "{cacheurl} {fastaurl} "
    "{extra} {log}"
)

I am currently using this replacement of the wrapper myself, and it gets the job done. Note that this solves issue 365 as well, and that I updated vep to version 104, which would need to be changed in the environment.yaml. Currently, the cache and the annotate wrapper use different versions of vep (101 and 102), which is probably not ideal.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions