Skip to content

Commit

Permalink
Merge pull request #25 from cokelaer/main
Browse files Browse the repository at this point in the history
add minimap2
  • Loading branch information
cokelaer authored Jan 30, 2024
2 parents 2070740 + 6ba7b48 commit eea37ec
Show file tree
Hide file tree
Showing 10 changed files with 253 additions and 85 deletions.
1 change: 1 addition & 0 deletions .github/workflows/apptainer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ jobs:
- name: install package itself
run: |
pip install .
pip install "pulp==2.7.0" --no-deps
- name: testing
run: |
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ jobs:
shell: bash -l {0}
run: |
pip install .
pip install "pulp==2.7.0" --no-deps
- name: Install dependencies
Expand Down
33 changes: 33 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@

files: '\.(py|rst|sh)$'
fail_fast: false

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
#- id: check-executables-have-shebangs
- id: check-ast

- repo: https://github.com/pycqa/flake8
rev: 6.1.0
hooks:
- id: flake8
args: ["-j8", "--ignore=E203,E501,W503,E722", "--max-line-length=120", "--exit-zero"]

- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black
args: ["--line-length=120"]
exclude: E501

- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"] # solves conflicts between black and isort

27 changes: 17 additions & 10 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ if you decide to use snakemake manually, do not forget to add singularity option

snakemake -s variant_calling.rules -c config.yaml --cores 4 --stats stats.txt --use-singularity --singularity-prefix ~/.sequana/apptainers --singularity-args "-B /home:/home"



Requirements
~~~~~~~~~~~~
Expand All @@ -91,6 +91,7 @@ This pipelines requires the following executable(s):
- freebayes
- picard (picard-tools)
- sambamba
- minimap2
- samtools
- snpEff you will need 5.0 or 5.1d (note the d); 5.1 does not work.

Expand All @@ -105,7 +106,7 @@ written by Erik Garrison. Input reads (paired or single) are mapped using
`bwa <http://bio-bwa.sourceforge.net/>`_ and sorted with
`sambamba-sort <http://lomereiter.github.io/sambamba/docs/sambamba-sort.html>`_.
PCR duplicates are marked with
`sambamba-markdup <http://lomereiter.github.io/sambamba/docs/sambamba-sort.html>`_.
`sambamba-markdup <http://lomereiter.github.io/sambamba/docs/sambamba-sort.html>`_.
`Freebayes <https://github.com/ekg/freebayes>`_ is used to detect SNPs and short
INDELs. The INDEL realignment and base quality recalibration are not necessary
with Freebayes. For more information, please refer to a post by Brad Chapman on
Expand All @@ -130,15 +131,22 @@ Changelog
========= ======================================================================
Version Description
========= ======================================================================
1.1.2 * add -Xmx8g option in snpeff rule at the build stage.
1.2.0 * -Xmx8g option previously added is not robust. Does not work with
snpEff 5.1 for instance.
* add minimap aligner
* add --nanopore and --pacbio to automatically set minimap2 as the
aligner and the minimap options (map-pb or map-ont)
* add minimap2 container.
* add missing resources in snpeff section
1.1.2 * add -Xmx8g option in snpeff rule at the build stage.
* add resources (8G) in the snpeff rule at run stage
* fix missing output_directory in sequana_coverage rule
* fix joint calling (regression) input function and inputs
1.1.1 * Fix regression in coverage rule
1.1.0 * add specific apptainer for freebayes (v1.2.0)
* Update API to use click
* Update API to use click
1.0.2 * Fixed failure in multiqc if coverage and snpeff are off
1.0.1 * automatically fill the bwa index algorithm and fix bwa_index rule to
1.0.1 * automatically fill the bwa index algorithm and fix bwa_index rule to
use the options in the config file (not the harcoded one)
1.0.0 * use last warppers and graphviz apptainer
0.12.0 * set all apptainers containers and add vcf to bcf conversions
Expand All @@ -149,7 +157,7 @@ Version Description
0.9.5 * fix typo in the onsuccess and update sequana requirements to use
most up-to-date snakemake rules
0.9.4 * fix typo related to the reference-file option new name not changed
everyhere in the pipeline.
everyhere in the pipeline.
0.9.3 * use new framework (faster --help, --from-project option)
* rename --reference into --reference-file and --annotation to
--annotation-file
Expand All @@ -159,7 +167,7 @@ Version Description
samplesnpeff)
* add multiqc to show sequana_coverage and snpeff summary sections
* cleanup onsuccess section
* more options sanity checks and options (e.g.,
* more options sanity checks and options (e.g.,
* genbank_file renamed into annotation_file in the config
* use --legacy in freebayes options
* fix coverage section to use new sequana api
Expand All @@ -172,7 +180,6 @@ Version Description
Contribute & Code of Conduct
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To contribute to this project, please take a look at the
`Contributing Guidelines <https://github.com/sequana/sequana/blob/maib/CONTRIBUTING.rst>`_ first. Please note that this project is released with a
To contribute to this project, please take a look at the
`Contributing Guidelines <https://github.com/sequana/sequana/blob/maib/CONTRIBUTING.rst>`_ first. Please note that this project is released with a
`Code of Conduct <https://github.com/sequana/sequana/blob/main/CONDUCT.md>`_. By contributing to this project, you agree to abide by its terms.

1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies:
- picard>2.26
- samtools>=1.15
- bamtools
- minimap2
- pip
- pip:
- sequana
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"

[tool.poetry]
name = "sequana-variant-calling"
version = "1.1.2"
version = "1.2.0"
description = "A multi-sample variant calling pipeline"
authors = ["Sequana Team"]
license = "BSD-3"
Expand Down
33 changes: 26 additions & 7 deletions sequana_pipelines/variant_calling/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,20 +8,33 @@
# use it, otherwise use input_samples.
# ============================================================================
#
sequana_wrappers: "v23.12.5"
sequana_wrappers: "v24.1.14"


# Mandatory fields
input_directory:
input_readtag: _R[12]_
input_pattern: '*fastq.gz'
annotation_file:
reference_file:


###########################################################################
# General section:
#
# :Parameters:
#
#
# - aligner_choice: either bwa or minimap2
#
general:
aligner_choice: bwa
annotation_file:
reference_file:

apptainers:
sequana_coverage: "https://zenodo.org/record/10209929/files/sequana_0.16.1.img"
sequana_tools: "https://zenodo.org/record/7963917/files/sequana_tools_0.15.1.img"
graphviz: "https://zenodo.org/record/7928262/files/graphviz_7.0.5.img"
minimap2: "https://zenodo.org/record/7341710/files/sequana_tools_0.14.5.img"
multiqc: "https://zenodo.org/record/10205070/files/multiqc_1.16.0.img"
freebayes: "https://zenodo.org/record/10259279/files/freebayes_1.2.0.img"

Expand All @@ -32,6 +45,12 @@ add_read_group:
options: ''


minimap2:
options: "-x map-ont"
threads: 4
resources:
mem: 8G


##############################################################################
# samtools_depth
Expand Down Expand Up @@ -78,12 +97,12 @@ bwa_mem:
# - do: if unchecked, this rule is ignored.
# - reference: genbank file.
# - options: any options recognised by snpEff.
# - build_options: use at the build stage of snpEff.
# - build_options: use at the build stage of snpEff.
# depending on the GFF, you may need to add options to skip some sanity checks
# such as noCheckProtein or -noCheckCds. Note also that newest version of snpeff
# may need more memory. To do so, you must set the -XmxNg option where N is to be
# replaced by the required memory in Gb. Version of snpEff 5.1 does not seem to accept
# that option.
# that option though.
# Requires the annotation file
#
# Results filter options:
Expand All @@ -95,8 +114,8 @@ bwa_mem:
#
snpeff:
do: true
build_options: " -Xmx8g " # -noCheckCds -noCheckProtein
options: -no-downstream -no-upstream
build_options: # -noCheckCds -noCheckProtein
options: -no-downstream -no-upstream
resources:
mem: 8G

Expand Down
95 changes: 78 additions & 17 deletions sequana_pipelines/variant_calling/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,16 @@
# documentation: http://sequana.readthedocs.io
#
##############################################################################
import sys
import os
import sys

import rich_click as click
import click_completion
import rich_click as click

click_completion.init()

from sequana_pipetools.options import *
from sequana_pipetools import SequanaManager

from sequana_pipetools.options import *

NAME = "variant_calling"

Expand All @@ -30,11 +29,14 @@
groups={
"Pipeline Specific": [
"--reference-file",
"--aligner-choice",
"--annotation-file",
"--circular",
"--do-coverage",
"--do-joint-calling",
"--freebayes-ploidy",
"--nanopore",
"--pacbio",
],
},
)
Expand All @@ -45,11 +47,58 @@
@include_options_from(ClickSlurmOptions)
@include_options_from(ClickInputOptions)
@include_options_from(ClickGeneralOptions)
@click.option("--annotation-file", "annotation", default=None, help="The annotation for snpeff. This is optional but highly recommended to obtain meaningful HTML report.")
@click.option("--do-coverage", "do_coverage", is_flag=True, default=False, show_default=True, help="perform the coverage analysis using sequana_coverage.")
@click.option(
"--aligner-choice",
"aligner",
type=click.Choice(["bwa", "minimap2"]),
default="bwa",
show_default=True,
help="The aligner in bwa / minimap2 .",
)
@click.option(
"--do-coverage",
"do_coverage",
is_flag=True,
default=False,
show_default=True,
help="perform the coverage analysis using sequana_coverage.",
)
@click.option(
"--annotation-file",
"annotation",
default=None,
help="The annotation for snpeff. This is optional but highly recommended to obtain meaningful HTML report.",
)
@click.option(
"--do-coverage",
"do_coverage",
is_flag=True,
default=False,
show_default=True,
help="perform the coverage analysis using sequana_coverage.",
)
@click.option(
"--nanopore",
is_flag=True,
default=False,
show_default=True,
help="if set, fix minimap2 as the aligner, and uses -x map-ont for the minimap2 options",
)
@click.option(
"--pacbio",
is_flag=True,
default=False,
show_default=True,
help="if set, fix minimap2 as the aligner, and uses -x map-pb for the minimap2 options",
)
@click.option("--do-joint-calling", "do_joint_calling", is_flag=True, help="do the joint calling analysis")
@click.option("--freebayes-ploidy", type=int, default=1, show_default=True,
help="""For population, or eukaryotes, change the ploidy to the correct values. For population, you may set it to 10.""")
@click.option(
"--freebayes-ploidy",
type=int,
default=1,
show_default=True,
help="""For population, or eukaryotes, change the ploidy to the correct values. For population, you may set it to 10.""",
)
@click.option("-o", "--circular", is_flag=True, help="Recommended for bacteria genomes and circularised genomes")
@click.option("--reference-file", "reference", required=True, help="The input reference to mapped reads onto")
def main(**options):
Expand All @@ -66,11 +115,17 @@ def main(**options):
def fill_annotation_file():
if options.annotation:
cfg.snpeff.do = True
cfg.annotation_file = os.path.abspath(options.annotation)
cfg.general.annotation_file = os.path.abspath(options.annotation)
# manager.exists(cfg.annotation_file)
# print("." in cfg.annotation_file, cfg.annotation_file.split(".")[-1])
if "." not in cfg.annotation_file or cfg.annotation_file.split(".")[-1] not in ["gbk", "gff", "gff3"]:
click.echo("The annotation file must in in .gbk or .gff or .gff3. You provided {cfg.annotation_file}")
if "." not in cfg.general.annotation_file or cfg.general.annotation_file.split(".")[-1] not in [
"gbk",
"gff",
"gff3",
]:
click.echo(
"The annotation file must in in .gbk or .gff or .gff3. You provided {cfg.general.annotation_file}"
)
sys.exit(1)
else:
cfg.snpeff.do = False
Expand All @@ -90,12 +145,18 @@ def fill_ploidy_freebayes():

def fill_reference_file():
# required argument
cfg.reference_file = os.path.abspath(options.reference)
cfg.general.reference_file = os.path.abspath(options.reference)

# first if option --long-read is used (overwritten by other options)
if options["nanopore"]:
cfg.general.aligner_choice = "minimap2"
cfg.minimap2.options = "-x map-ont"
elif options["pacbio"]:
cfg.general.aligner_choice = "minimap2"
cfg.minimap2.options = "-x map-pb"

if options["from_project"]:
if "--reference-file" in options:
fill_reference_file()

raise NotImplementedError
else:
fill_annotation_file()
fill_do_coverage()
Expand All @@ -104,10 +165,10 @@ def fill_reference_file():
fill_ploidy_freebayes()
fill_reference_file()

# Given the reference, let us compute its length and st the index algorithm
# Given the reference, let us compute its length and the index algorithm
from sequana import FastA

f = FastA(cfg.reference_file)
f = FastA(cfg.general.reference_file)
N = f.get_stats()["total_length"]

# seems to be a hardcoded values in bwa according to the documentation
Expand Down
Loading

0 comments on commit eea37ec

Please sign in to comment.