Objective: We developed an in-house bioinformatics pipeline to improve the detection of respiratory pathogens in metagenomic sequencing data. This pipeline addresses the need for short-time analysis, high accuracy, scalability, and reproducibility in a high-performance computing environment.
Results: We evaluated our pipeline using ninety synthetic metagenomes designed to simulate nasopharyngeal swab samples. The pipeline successfully identified 177 out of 204 respiratory pathogens present in the compositions, with an average processing time of approximately 4 minutes per sample (processing 1 million paired-end reads of 150 base pairs). For the estimation of all the 470 taxa included in the compositions, the pipeline demonstrated high accuracy, identifying 420 and achieving a correlation of 0.9 between their actual and predicted relative abundances. Among the identified taxa, 27 were significantly underestimated or overestimated, including only three clinically relevant pathogens. These findings underscore the pipeline's effectiveness in pathogen detection and highlight its potential utility in respiratory pathogen surveillance.
Our work performed the following steps:
- Generation of Synthetic Metagenomes
- Defining the samples composition
- Collecting the taxonomic ranks metadata
- Defining each taxon abundance
- Downloading the genomes of these taxa
- Generation of the synthetic metagenomes
- Execution of Analysis Pipeline
- Adapter Trimming and Quality Filtering
- Host Decontamination
- Taxa Annotation
- Species-Level taxa abundance retrieval
- Creating and Plotting Results
Install the necessary software using the following commands:
# Python
sudo apt update
sudo apt install python3 python3-pip make
pip3 install biopython
# Install Fastp
sudo apt install fastp
# Install HISAT2
sudo apt install hisat2
# Install Bowtie2
sudo apt install bowtie2
# Install Samtools
sudo apt install samtools
# Install Kraken2
sudo apt install kraken2
# Install Bracken
wget https://github.com/jenniferlu717/Bracken/archive/refs/tags/v2.9.tar.gz
tar -xvzf v2.9.tar.gz
cd Bracken-2.9
./install_bracken.sh
sudo mv bracken /usr/local/bin/
sudo mv bracken-build /usr/local/bin/
sudo mv src/kmer2read_distr /usr/local/bin/
sudo mv src/est_abundance.py /usr/local/bin/
sudo mv src/generate_kmer_distribution.py /usr/local/bin/
- Clone the repository
git clone https://github.com/cidacslab/aesop-metagenomics-pipeline.git
cd aesop-metagenomics-pipeline
- Execute the pipeline
To start the pipeline edit the parameters for the file locations, like explained in the methods, and execute the following command:
./src/start_pipeline_job.sh
If you use this pipeline in your research, please cite the following paper:
Viana, P. A. B.; Tschoeke, D. A.; de Moraes, L.; Santos, L. A.; Barral-Netto, M.; Khouri, R.; Ramos, P. I. P.; Meirelles, P. M.; (2024). Design and Implementation of a Metagenomic Analytical Pipeline for Respiratory Pathogen Detection. BMC Res Notes 17, 291 (2024). https://doi.org/10.1186/s13104-024-06964-9
- Corresponding Author: Pedro M Meirelles ([email protected])
- On any code issues, correspond to: Pablo Viana ([email protected])