Skip to content

Commit

Permalink
bump version to v0.12.0
Browse files Browse the repository at this point in the history
  • Loading branch information
AroneyS committed Jul 24, 2024
1 parent 71d89e8 commit 3f9b07f
Show file tree
Hide file tree
Showing 5 changed files with 97 additions and 5 deletions.
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ authors:
given-names: Ben J.
orcid: https://orcid.org/0000-0003-0670-7480
title: "Bin Chicken: targeted recovery of low abundance metagenome assembled genomes through intelligent coassembly"
version: 0.11.0
version: 0.12.0
doi: 10.5281/zenodo.10511708
date-released: 2024-06-10
date-released: 2024-07-24
2 changes: 1 addition & 1 deletion binchicken/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.11.0"
__version__ = "0.12.0"
54 changes: 52 additions & 2 deletions docs/tools/coassemble.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,32 @@ Important options:
- Assembly and recovery running options:
- Run directly through Aviary (`--run-aviary`)
- Run Aviary commands manually (see `coassemble/commands` in output)
- Run coassemblies with differential-abudance-binning samples with the tool of your choice (see `coassemble/target/elusive_clusters.tsv` in output)
- Run coassemblies with differential-abundance-binning samples with the tool of your choice (see `coassemble/target/elusive_clusters.tsv` in output)
- The taxa of the considered sequences can be filtered to target a specific taxon (e.g. `--taxa-of-interest "p__Planctomycetota"`).
- Differential-abundance binning samples for single-assembly can also be found (`--single-assembly`)

Paired end reads of form reads_1.1.fq, reads_1_1.fq and reads_1_R1.fq, where reads_1 is the sample name are automatically detected and matched to their basename.
Most intermediate files can be provided to skip intermediate steps (e.g. SingleM otu tables, read sizes or genome transcripts; see `binchicken coassemble --full-help`).

## Abundance weighting

By default, coassemblies are ranked by the number of feasibly-recovered target sequences they contain.
Instead, `--abundance-weighting` can be used to weight target sequences by their average abundance across samples.
This prioritises recovery of the most abundant lineages.
The samples for which abundances are calculated can be restricted using `--abundance-weighting-samples`.

## Kmer preclustering

Clustering groups of more than 1000 samples quickly leads to memory issues due to combinatorics.
Kmer preclustering can be used (default if >1000 samples are provided, or use `--kmer-precluster always`) to reduce the number of combinations that are considered.
This greatly reduces memory usage and allows scaling up to at least 250k samples.
Kmer preclustering can be disabled with `--kmer-precluster never`.

## Cluster submission

Snakemake profiles can be used to automatically submit jobs to HPC clusters (`--snakemake-profile`).
Note that Aviary assemble commands are submitted to the cluster, while Aviary recover commands are run locally such that Aviary handles cluster submission.
The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel within `--local-cores`.
The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel by setting `--local-cores` to greater than 1.
This is required to prevent `--local-cores` from limiting the number of threads per submitted job.

# OPTIONS
Expand Down Expand Up @@ -257,6 +271,42 @@ This is required to prevent `--local-cores` from limiting the number of threads

<!-- -->

**\--abundance-weighted**

Weight sequences by mean sample abundance when ranking clusters
[default: False]

<!-- -->

**\--abundance-weighted-samples** *ABUNDANCE_WEIGHTED_SAMPLES* [*ABUNDANCE_WEIGHTED_SAMPLES* \...]

Restrict sequence weighting to these samples. Remaining samples will
still be used for coassembly [default: use all samples]

<!-- -->

**\--abundance-weighted-samples-list** *ABUNDANCE_WEIGHTED_SAMPLES_LIST*

Restrict sequence weighting to these samples, newline separated.
Remaining samples will still be used for coassembly [default: use
all samples]

<!-- -->

**\--kmer-precluster** {never,large,always}

Run kmer preclustering using unbinned window sequences as kmers.
[default: large; perform preclustering when given \>1000 samples]

<!-- -->

**\--precluster-size** *PRECLUSTER_SIZE*

\# of samples within each sample\'s precluster [default: 5 \*
max-recovery- samples]

<!-- -->

**\--prodigal-meta**

Use prodigal \"-p meta\" argument (for testing)
Expand Down
36 changes: 36 additions & 0 deletions docs/tools/iterate.md
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,42 @@ Automatically excludes previous coassemblies.

<!-- -->

**\--abundance-weighted**

Weight sequences by mean sample abundance when ranking clusters
[default: False]

<!-- -->

**\--abundance-weighted-samples** *ABUNDANCE_WEIGHTED_SAMPLES* [*ABUNDANCE_WEIGHTED_SAMPLES* \...]

Restrict sequence weighting to these samples. Remaining samples will
still be used for coassembly [default: use all samples]

<!-- -->

**\--abundance-weighted-samples-list** *ABUNDANCE_WEIGHTED_SAMPLES_LIST*

Restrict sequence weighting to these samples, newline separated.
Remaining samples will still be used for coassembly [default: use
all samples]

<!-- -->

**\--kmer-precluster** {never,large,always}

Run kmer preclustering using unbinned window sequences as kmers.
[default: large; perform preclustering when given \>1000 samples]

<!-- -->

**\--precluster-size** *PRECLUSTER_SIZE*

\# of samples within each sample\'s precluster [default: 5 \*
max-recovery- samples]

<!-- -->

**\--prodigal-meta**

Use prodigal \"-p meta\" argument (for testing)
Expand Down
6 changes: 6 additions & 0 deletions docs/tools/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ binchicken update --coassemble-output coassemble_dir --sra \
Download reads from SRA (read argument still required). Also sets
\--run-qc.

<!-- -->

**\--download-limit** *DOWNLOAD_LIMIT*

Parallel download limit [default: 3]

# COASSEMBLY OPTIONS

**\--coassemble-output** *COASSEMBLE_OUTPUT*
Expand Down

0 comments on commit 3f9b07f

Please sign in to comment.