Skip to content

Commit

Permalink
update docs and citation
Browse files Browse the repository at this point in the history
  • Loading branch information
AroneyS committed Dec 2, 2024
1 parent cdf57b8 commit 3338652
Show file tree
Hide file tree
Showing 6 changed files with 76 additions and 27 deletions.
8 changes: 4 additions & 4 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ authors:
- family-names: Woodcroft
given-names: Ben J.
orcid: https://orcid.org/0000-0003-0670-7480
title: "Bin Chicken: targeted recovery of low abundance metagenome assembled genomes through intelligent coassembly"
version: 0.12.5
doi: 10.5281/zenodo.10511708
date-released: 2024-09-06
title: "Bin Chicken: targeted metagenomic coassembly for the efficient recovery of novel genomes"
version: 0.12.6
doi: 10.1101/2024.11.24.625082
date-released: 2024-12-02
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,9 @@ Bin Chicken - recovery of low abundance and taxonomically targeted metagenome as
Documentation can be found at https://AroneyS.github.io/binchicken/

Logo by Georgina H. Joyce | www.georginajoyce.com

## Citation

Samuel T. N. Aroney, Rhys J. P. Newell, Gene W. Tyson and Ben J. Woodcroft.
Bin Chicken: targeted metagenomic coassembly for the efficient recovery of novel genomes.
bioRxiv (2024): 2024-11. https://doi.org/10.1101/2024.11.24.625082
31 changes: 20 additions & 11 deletions docs/tools/coassemble.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ Important options:
- The taxa of the considered sequences can be filtered to target a specific taxon (e.g. `--taxa-of-interest "p__Planctomycetota"`).
- Differential-abundance binning samples for single-assembly can also be found (`--single-assembly`)

Paired end reads of form reads_1.1.fq, reads_1_1.fq and reads_1_R1.fq, where reads_1 is the sample name are automatically detected and matched to their basename.
Paired end reads of form \*.1.fq, \*_1.fq and \*_R1.fq, where \* represents the sample name are automatically detected and matched to their basename.
Most intermediate files can be provided to skip intermediate steps (e.g. SingleM otu tables, read sizes or genome transcripts; see `binchicken coassemble --full-help`).

## Abundance weighting
## Abundance weighting (experimental)

By default, coassemblies are ranked by the number of feasibly-recovered target sequences they contain.
Instead, `--abundance-weighted` can be used to weight target sequences by their average abundance across samples.
Expand All @@ -58,13 +58,6 @@ Kmer preclustering can be used (default if >1000 samples are provided, or use `-
This greatly reduces memory usage and allows scaling up to at least 250k samples.
Kmer preclustering can be disabled with `--kmer-precluster never`.

## Cluster submission

Snakemake profiles can be used to automatically submit jobs to HPC clusters (`--snakemake-profile`).
Note that Aviary assemble commands are submitted to the cluster, while Aviary recover commands are run locally such that Aviary handles cluster submission.
The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel by setting `--local-cores` to greater than 1.
This is required to prevent `--local-cores` from limiting the number of threads per submitted job.

# OPTIONS

# BASE INPUT ARGUMENTS
Expand Down Expand Up @@ -206,14 +199,15 @@ This is required to prevent `--local-cores` from limiting the number of threads
**\--taxa-of-interest** *TAXA_OF_INTEREST*

Only consider sequences from this GTDB taxa (e.g.
p\_\_Planctomycetota) [default: all]
p\_\_Planctomycetota, or

<!-- -->

**\--appraise-sequence-identity** *APPRAISE_SEQUENCE_IDENTITY*

Minimum sequence identity for SingleM appraise against reference
database [default: 86%, Genus-level]
database. e.g. 96% for Species-level or 86% Genus-level [default:
0.96]

<!-- -->

Expand Down Expand Up @@ -300,6 +294,13 @@ This is required to prevent `--local-cores` from limiting the number of threads

<!-- -->

**\--precluster-distances** *PRECLUSTER_DISTANCES*

Distance file in the format of \`sourmash scripts pairwise\`. If
provided, kmer sketching and clustering is skipped.

<!-- -->

**\--precluster-size** *PRECLUSTER_SIZE*

\# of samples within each sample\'s precluster [default: 5 \*
Expand Down Expand Up @@ -353,6 +354,14 @@ This is required to prevent `--local-cores` from limiting the number of threads

<!-- -->

**\--prior-assemblies** *PRIOR_ASSEMBLIES*

Prior assemblies to use for Aviary recovery. tsv file with header:
name [tab] assembly. Only possible with single-sample or update.
[default: generate assemblies through Aviary assemble]

<!-- -->

**\--cluster-submission**

Flag that cluster submission will occur through
Expand Down
21 changes: 19 additions & 2 deletions docs/tools/iterate.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ binchicken iterate --coassemble-output coassemble_dir \
```

Defaults to using genomes (from the provided coassemble outputs) with at least 70% complete and at most 10% contamination as estimated by CheckM2.
Alternatively, selected genomes can be provided directly with `--new-genomes`.
Automatically excludes previous coassemblies.

# OPTIONS
Expand Down Expand Up @@ -237,14 +238,15 @@ Automatically excludes previous coassemblies.
**\--taxa-of-interest** *TAXA_OF_INTEREST*

Only consider sequences from this GTDB taxa (e.g.
p\_\_Planctomycetota) [default: all]
p\_\_Planctomycetota, or

<!-- -->

**\--appraise-sequence-identity** *APPRAISE_SEQUENCE_IDENTITY*

Minimum sequence identity for SingleM appraise against reference
database [default: 86%, Genus-level]
database. e.g. 96% for Species-level or 86% Genus-level [default:
0.96]

<!-- -->

Expand Down Expand Up @@ -331,6 +333,13 @@ Automatically excludes previous coassemblies.

<!-- -->

**\--precluster-distances** *PRECLUSTER_DISTANCES*

Distance file in the format of \`sourmash scripts pairwise\`. If
provided, kmer sketching and clustering is skipped.

<!-- -->

**\--precluster-size** *PRECLUSTER_SIZE*

\# of samples within each sample\'s precluster [default: 5 \*
Expand Down Expand Up @@ -384,6 +393,14 @@ Automatically excludes previous coassemblies.

<!-- -->

**\--prior-assemblies** *PRIOR_ASSEMBLIES*

Prior assemblies to use for Aviary recovery. tsv file with header:
name [tab] assembly. Only possible with single-sample or update.
[default: generate assemblies through Aviary assemble]

<!-- -->

**\--cluster-submission**

Flag that cluster submission will occur through
Expand Down
29 changes: 19 additions & 10 deletions docs/tools/single.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Important options:
- Run assemblies with differential-abundance-binning samples with the tool of your choice (see `coassemble/target/elusive_clusters.tsv` in output)
- The taxa of the considered sequences can be filtered to target a specific taxon (e.g. `--taxa-of-interest "p__Planctomycetota"`).

Paired end reads of form reads_1.1.fq, reads_1_1.fq and reads_1_R1.fq, where reads_1 is the sample name are automatically detected and matched to their basename.
Paired end reads of form \*.1.fq, \*_1.fq and \*_R1.fq, where \* represents the sample name are automatically detected and matched to their basename.
Most intermediate files can be provided to skip intermediate steps (e.g. SingleM otu tables, read sizes or genome transcripts; see `binchicken coassemble --full-help`).

## Kmer preclustering
Expand All @@ -37,13 +37,6 @@ Kmer preclustering can be used (default if >1000 samples are provided, or use `-
This greatly reduces memory usage and allows scaling up to at least 250k samples.
Kmer preclustering can be disabled with `--kmer-precluster never`.

## Cluster submission

Snakemake profiles can be used to automatically submit jobs to HPC clusters (`--snakemake-profile`).
Note that Aviary assemble commands are submitted to the cluster, while Aviary recover commands are run locally such that Aviary handles cluster submission.
The `--cluster-submission` flag sets the local Aviary recover thread usage to 1, to enable multiple runs in parallel by setting `--local-cores` to greater than 1.
This is required to prevent `--local-cores` from limiting the number of threads per submitted job.

# OPTIONS

# BASE INPUT ARGUMENTS
Expand Down Expand Up @@ -185,14 +178,15 @@ This is required to prevent `--local-cores` from limiting the number of threads
**\--taxa-of-interest** *TAXA_OF_INTEREST*

Only consider sequences from this GTDB taxa (e.g.
p\_\_Planctomycetota) [default: all]
p\_\_Planctomycetota, or

<!-- -->

**\--appraise-sequence-identity** *APPRAISE_SEQUENCE_IDENTITY*

Minimum sequence identity for SingleM appraise against reference
database [default: 86%, Genus-level]
database. e.g. 96% for Species-level or 86% Genus-level [default:
0.96]

<!-- -->

Expand Down Expand Up @@ -279,6 +273,13 @@ This is required to prevent `--local-cores` from limiting the number of threads

<!-- -->

**\--precluster-distances** *PRECLUSTER_DISTANCES*

Distance file in the format of \`sourmash scripts pairwise\`. If
provided, kmer sketching and clustering is skipped.

<!-- -->

**\--precluster-size** *PRECLUSTER_SIZE*

\# of samples within each sample\'s precluster [default: 5 \*
Expand Down Expand Up @@ -332,6 +333,14 @@ This is required to prevent `--local-cores` from limiting the number of threads

<!-- -->

**\--prior-assemblies** *PRIOR_ASSEMBLIES*

Prior assemblies to use for Aviary recovery. tsv file with header:
name [tab] assembly. Only possible with single-sample or update.
[default: generate assemblies through Aviary assemble]

<!-- -->

**\--cluster-submission**

Flag that cluster submission will occur through
Expand Down
8 changes: 8 additions & 0 deletions docs/tools/update.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,6 +199,14 @@ binchicken update --coassemble-output coassemble_dir --sra \

<!-- -->

**\--prior-assemblies** *PRIOR_ASSEMBLIES*

Prior assemblies to use for Aviary recovery. tsv file with header:
name [tab] assembly. Only possible with single-sample or update.
[default: generate assemblies through Aviary assemble]

<!-- -->

**\--cluster-submission**

Flag that cluster submission will occur through
Expand Down

0 comments on commit 3338652

Please sign in to comment.