Hackathon team: Lead: Steve Tsang - SysAdmins: Greg Fedewa, Daniel Quang, Sherif Farag - Writers: Matthew Moss, Alexey V. Rakov
Antibiotic resistance of bacterial pathogens remains a major threat to public health around the world. Fast and reliable extraction of antimicrobial resistance genomic signatures from large raw sequencing datasets obtained from human metagenomes is a key task for bioinformatics. NastyBugs is a versatile workflow for fast extracting of antimicrobial resistance genomic signatures from metagenomes.
Objective: Create a reusable, reproducible, scalable, interoperable workflow to locate antimicrobial resistant genomic signatures in SRA shotgun sequencing (metagenomics) datasets
Software:
DBs used for BLAST databases:
NCBI GRCh37/UCSC hg19 human reference genome
CARD (Comprehensive Antibiotic Resistance Database) DB
RefSeq Reference Bacterial Genomes
The pipeline use three databases that should be downloaded with the script:
- GRCh37/hg19 human reference genome database used for alignment and filtering reads of human origin from metagenomics samples.
- CARD database used for search of genomic signatures in the subset of reads unaligned to human genome.
- RefSeq reference bacterial genomes database used for search and assigning of 16S RNA taxonomic labels the subset of reads unaligned to human genome.
Step 1. Mapping sample SRR to human genome using Magic-BLAST:
>magicblast13 -sra SRRXXXXXXX -db ~/references/human -num_threads 12 -score 50 -penalty -3 -out ~/test_run/SRRXXXXXXX_human.sam
Step 2. Filtering reads mapped to human genome using SAMtools (Removal of host (human) genome from metagenomics data):
>samtools fasta -f 4 SRRXXXXXXX_human.sam -1 SRRXXXXXXX_read1.fasta -2 SRRXXXXXXX_read2.fasta -0 SRRXXXXXXX_read0.fasta
>fastx_clipper [-i INFILE] [-o OUTFILE]
Step 3. Searching 16S RNA taxonomic labels in RefSeq reference bacterial genomes database to identify microbial species presented in metagenome using Magic-BLAST:
>magicblast13 -infmt fasta -query ~/test_run/SRRXXXXXXX_read1.fasta -query_mate ~/test_run/SRRXXXXXXX_read2.fasta -num_threads 12 -score 50 -penalty -3 -out ~/test_run/SRRXXXXXXX_refseq.sam -db ~/references/REFSEQ
Step 4. Searching genes and SNPs from CARD database in metagenome using Magic-BLAST:
>magicblast13 -infmt fasta -query ~/test_run/SRRXXXXXXX_read1.fasta -query_mate ~/test_run/SRRXXXXXXX_read2.fasta -num_threads 12 -score 50 -penalty -3 -out ~/test_run/SRRXXXXXXX_CARD_SNP.sam -db ~/references/CARD_variant
>magicblast13 -infmt fasta -query SRRXXXXXXX_read1.fasta -query_mate SRRXXXXXXX_read2.fasta -num_threads 12 -score 50 -penalty -3 -out SRRXXXXXXX_CARD_gene.sam -db ~/references/CARD_gene
Step 5. Converting SAM to BAM format and sorting using SAMtools:
>samtools view -bS SRRXXXXXXX_SNP.sam | samtools sort - -o SRRXXXXXXX_SNP.bam
>samtools view -bS SRRXXXXXXX_CARD_gene.sam | samtools sort - -o SRRXXXXXXX_CARD_gene.bam
Step 6. Producing detailed output file(s) including names of detected bacterial species and resistance genes with statistical metrics in text and graphical formats.
Documented workflow with containerized tools in Docker
sudo docker images
sudo docker pull stevetsa/docker-magicblast
sudo docker run -it stevetsa/docker-magicblast
sudo docker ps -a
main.sh -S SRA -o output_directory
SRA accession numbers (ERR or SRR) or FASTQ files
- Table (in CSV or TAB-delimited format) with the next columns:
- RefSeq accession number (Nucleotide)
- Genus
- Resistance gene
- ARO (Antibiotic Resistance Ontology)
- Score (number of mapped reads per 1kb)
-
Dot plot showing relative abundance of antimicrobial resistance/bacterial species in metagenomic sample.
-
Pie chart vizualization of bacterial abundance in the given dataset (Ondov BD, Bergman NH, and Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30; 12(1):385).
- Steve Tsang, NCI/NIH, Gaithersburg, MD, [email protected]
- Greg Fedewa, UCSF, San Francisco, CA, [email protected]
- Sherif Farag, UNC, Chapel Hill, NC, [email protected]
- Matthew Moss, CSHL, Cold Spring Harbor, NY, [email protected]
- Daniel Quang, UCI, Irvine, CA, [email protected]
- Alexey V. Rakov, UPenn, Philadelphia, PA, [email protected]