Base-resolution pan-genome revealed hidden structural variants and eliminated substantial false positive short variants in Clostridioides difficile
Base-resolution pan-genomics enables comprehensive detection of SNPs, indels, and structural variants (SVs), but is underused in bacterial studies due to limitations of long-read sequencing. This project demonstrates that linked-read sequencing offers a scalable alternative, enabling near-complete de novo assemblies in Clostridioides difficile. Using graph-based pan-genomics, we achieved high-resolution variant detection and improved accuracy over conventional short-read methods, especially near SVs. The resulting pan-genome graph provides a valuable resource for future C. difficile genomic research.
This is a repository with data and analysis scripts needed to reproduce.
The repository was structured as follows:
- README.md: this file.
- analysis.md: instructions to reproduce results from the manuscript.
- data: data needed
- src: analysis scripts
- figures: figures reproduced from the scripts.
To reproduce figures from the manuscript, make sure the following tools are available
Install the following python3 packages via pip:
pip install matplotlib pandas numpy seaborn click
Clone this repo:
git clone [email protected]:x-lab/linked-reads-cdiff.git
cd linked-reads-cdiff
# then follow the analysis.md
The data used in this study have been deposited in the Sequence Read Archive database under accession codes PRJNA1133827 (linkes to be added).
The code for reproducing figures and tables is available on github linked-reads-cdiff
This research is a collaborative work by:
- Hangzhou Medical College: Dazhi Jin
- Peking University: Xiao Li
- Hangzhou Center for Disease Control: Jun Li
To citate this work, use the following:
(to be update)
Submit an issue here if you have questions.