Skip to content

broadinstitute/Bb_pangenome

Repository files navigation

Complex exchanges among plasmids and clonal expansion of lineages shape the population structure and virulence of Borrelia burgdorferi

Preprint: https://doi.org/10.1101/2025.01.29.635312

This repository contains all code and sequences used in the above manuscript.

Ensure you have git lfs installed to access all files and data. (Instructions below)

Total repository size: 23G

Directory Structure

  • assemblies/ - all assemblies and annotations used in our analyses are housed here. The subdirectory name is Strain_ID from table_s1.
    • Each assembly has the following contents:
      • *.embl
      • *.faa
      • *.ffn
      • *.fna
      • *.gbff
      • *.gff3
      • *.hypotheticals.faa
      • *.hypotheticals.tsv
      • *.inference.tsv
      • *.json
      • *.log
      • *.tsv
      • *.txt
  • containers/ - all containers used in these analyses are defined here. Contains submodule: mjf-containers
  • group2BB/ - Code used to map roary pangene groups back to B31 genes.
  • metadb/ - Code used to setup and query an sqlite3 database containing all assemblies and their annotations.
  • notebooks/ - Jupyter notebooks used throughout development. Some notebooks may be superceded by scripts present in scripts/
  • output/ - Contains all output used in the analysis and figure generation.
    • alignments/ - contains alignments for all vs all, all vs B31.
    • genotyping/ - contains OspC, MLST, RST typing results and plasmid calls.
    • homology_networks/ - contains homology network graphs and network.json used to render graph in Blender.
    • reports/ - contains AGAT stats, kraken2, quast, and multiqc reports.
    • results/ - contains roary pangenome output for split and non-split paralogs.
  • ref/ - contains database of assemblies, B31 reference genome, and replicon references used in classification.
  • scripts/ - contains scripts and commands used in various analyses.
  • snakefiles/ - contains snakefiles and their configs.

In situations where code or data are unable to be located, refer to our working repository


Using LFS to access files and data

To get started, clone the repository and setup LFS:

git clone https://github.com/broadinstitute/Bb_pangenome.git
cd Bb_pangenome
git lfs install 

To retrieve LFS tracked files individually:

git lfs fetch path/to/lfs-tracked-file.ext
git lfs checkout

To retrieve LFS tracked directories:

git lfs fetch path/to/lfs-tracked-directory/
git lfs checkout

To retrieve all files and data:

git lfs fetch --all
git lfs checkout

About

All required code and data for pangenomic analysis of Borrelia burgdorferi. Lemieux Lab 2025.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •