bump version to v0.12.0

AroneyS · Jul 24, 2024 · 3f9b07f · 3f9b07f
1 parent 71d89e8
commit 3f9b07f
Show file tree

Hide file tree

Showing 5 changed files with 97 additions and 5 deletions.
diff --git a/CITATION.cff b/CITATION.cff
@@ -17,6 +17,6 @@ authors:
     given-names: Ben J.
     orcid: https://orcid.org/0000-0003-0670-7480
 title: "Bin Chicken: targeted recovery of low abundance metagenome assembled genomes through intelligent coassembly"
-version: 0.11.0
+version: 0.12.0
 doi: 10.5281/zenodo.10511708
-date-released: 2024-06-10
+date-released: 2024-07-24
diff --git a/binchicken/__init__.py b/binchicken/__init__.py
@@ -1 +1 @@
-__version__ = "0.11.0"
+__version__ = "0.12.0"
diff --git a/docs/tools/coassemble.md b/docs/tools/coassemble.md
@@ -37,18 +37,32 @@ Important options:
 - Assembly and recovery running options:
   - Run directly through Aviary (`--run-aviary`)
   - Run Aviary commands manually (see `coassemble/commands` in output)
-  - Run coassemblies with differential-abudance-binning samples with the tool of your choice (see `coassemble/target/elusive_clusters.tsv` in output)
+  - Run coassemblies with differential-abundance-binning samples with the tool of your choice (see `coassemble/target/elusive_clusters.tsv` in output)
 - The taxa of the considered sequences can be filtered to target a specific taxon (e.g. `--taxa-of-interest "p__Planctomycetota"`).
 - Differential-abundance binning samples for single-assembly can also be found (`--single-assembly`)
 
 Paired end reads of form reads_1.1.fq, reads_1_1.fq and reads_1_R1.fq, where reads_1 is the sample name are automatically detected and matched to their basename.
 Most intermediate files can be provided to skip intermediate steps (e.g. SingleM otu tables, read sizes or genome transcripts; see `binchicken coassemble --full-help`).
 
+## Abundance weighting
+
+By default, coassemblies are ranked by the number of feasibly-recovered target sequences they contain.
+Instead, `--abundance-weighting` can be used to weight target sequences by their average abundance across samples.
+This prioritises recovery of the most abundant lineages.
+The samples for which abundances are calculated can be restricted using `--abundance-weighting-samples`.
+
+## Kmer preclustering
+
+Clustering groups of more than 1000 samples quickly leads to memory issues due to combinatorics.
+Kmer preclustering can be used (default if >1000 samples are provided, or use `--kmer-precluster always`) to reduce the number of combinations that are considered.
+This greatly reduces memory usage and allows scaling up to at least 250k samples.
+Kmer preclustering can be disabled with `--kmer-precluster never`.
+
 ## Cluster submission
 
 Snakemake profiles can be used to automatically submit jobs to HPC clusters (`--snakemake-profile`).
 Note that Aviary assemble commands are submitted to the cluster, while Aviary recover commands are run locally such that Aviary handles cluster submission.
-The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel within `--local-cores`.
+The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel by setting `--local-cores` to greater than 1.
 This is required to prevent `--local-cores` from limiting the number of threads per submitted job.
 
 # OPTIONS
@@ -257,6 +271,42 @@ This is required to prevent `--local-cores` from limiting the number of threads
 
 <!-- -->
 
+**\--abundance-weighted**
+
+  Weight sequences by mean sample abundance when ranking clusters
+    [default: False]
+
+<!-- -->
+
+**\--abundance-weighted-samples** *ABUNDANCE_WEIGHTED_SAMPLES* [*ABUNDANCE_WEIGHTED_SAMPLES* \...]
+
+  Restrict sequence weighting to these samples. Remaining samples will
+    still be used for coassembly [default: use all samples]
+
+<!-- -->
+
+**\--abundance-weighted-samples-list** *ABUNDANCE_WEIGHTED_SAMPLES_LIST*
+
+  Restrict sequence weighting to these samples, newline separated.
+    Remaining samples will still be used for coassembly [default: use
+    all samples]
+
+<!-- -->
+
+**\--kmer-precluster** {never,large,always}
+
+  Run kmer preclustering using unbinned window sequences as kmers.
+    [default: large; perform preclustering when given \>1000 samples]
+
+<!-- -->
+
+**\--precluster-size** *PRECLUSTER_SIZE*
+
+  \# of samples within each sample\'s precluster [default: 5 \*
+    max-recovery- samples]
+
+<!-- -->
+
 **\--prodigal-meta**
 
   Use prodigal \"-p meta\" argument (for testing)

diff --git a/docs/tools/iterate.md b/docs/tools/iterate.md
@@ -302,6 +302,42 @@ Automatically excludes previous coassemblies.
 
 <!-- -->
 
+**\--abundance-weighted**
+
+  Weight sequences by mean sample abundance when ranking clusters
+    [default: False]
+
+<!-- -->
+
+**\--abundance-weighted-samples** *ABUNDANCE_WEIGHTED_SAMPLES* [*ABUNDANCE_WEIGHTED_SAMPLES* \...]
+
+  Restrict sequence weighting to these samples. Remaining samples will
+    still be used for coassembly [default: use all samples]
+
+<!-- -->
+
+**\--abundance-weighted-samples-list** *ABUNDANCE_WEIGHTED_SAMPLES_LIST*
+
+  Restrict sequence weighting to these samples, newline separated.
+    Remaining samples will still be used for coassembly [default: use
+    all samples]
+
+<!-- -->
+
+**\--kmer-precluster** {never,large,always}
+
+  Run kmer preclustering using unbinned window sequences as kmers.
+    [default: large; perform preclustering when given \>1000 samples]
+
+<!-- -->
+
+**\--precluster-size** *PRECLUSTER_SIZE*
+
+  \# of samples within each sample\'s precluster [default: 5 \*
+    max-recovery- samples]
+
+<!-- -->
+
 **\--prodigal-meta**
 
   Use prodigal \"-p meta\" argument (for testing)

diff --git a/docs/tools/update.md b/docs/tools/update.md
@@ -88,6 +88,12 @@ binchicken update --coassemble-output coassemble_dir --sra \
   Download reads from SRA (read argument still required). Also sets
     \--run-qc.
 
+<!-- -->
+
+**\--download-limit** *DOWNLOAD_LIMIT*
+
+  Parallel download limit [default: 3]
+
 # COASSEMBLY OPTIONS
 
 **\--coassemble-output** *COASSEMBLE_OUTPUT*