Install Software - Run Realistic Analysis make VCF file error #228

Cmbarrows · 2022-08-16T16:53:56Z

Cmbarrows
Aug 16, 2022

I apologize if this is a redundant question, I can't seem to figure this out based on the ongoing FASTQ topic thread.

I'm making my way through a fresh install and am on part 3. I've created and set my work directory but then this happens.

$ conda activate bioinfo
(bioinfo)

$ curl -s http://data.biostarhandbook.com/make/snpcall.mk > Makefile
(bioinfo)

$ make
makefile:61: *** "Program: bio not found.". Stop.
(bioinfo)

I get this error "Program: bio not found." by trying the command "make vcf" as well.

I tried to install the sratools to see if that makes it work.

$ curl http://data.biostarhandbook.com/sratools.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 898 100 898 0 0 4513 0 --:--:-- --:--:-- --:--:-- 4535
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 81.8M 100 81.8M 0 0 10.7M 0 0:00:07 0:00:07 --:--:-- 12.2M

Installation complete

Close and reopen the terminal.

In the new terminal activate the bioinfo environment.

(bioinfo)

So I do that and run the prompt again but I still get the same error message.

If I remove the "Makefile command" I get this. $ curl -s http://data.biostarhandbook.com/make/snpcall.mk

Make sure we run in bash.

SHELL = bash

Accession number for reference genome.

ACC ?= AF086833

Accession number for the SRA run.

SRR ?= SRR1553425

Number of compute cores.

CPU ?= 4

Number of test reads used when running testdata.

LIMIT ?=10000

GenBank reference file.

GBK=${ACC}.gbk

Fasta reference file.

REF=refs/${ACC}.fa

GFF reference file.

GFF=refs/${ACC}.gff

Temporary directory used only with fasterq-dump.

All names will have this common element.

ROOT=${SRR}-${ACC}

Alignment file.

BAM=bam/${ROOT}.bam

Variant file.

VCF=vcf/${ROOT}.vcf.gz

The annotated VCF file.

ANN=vcf/${ROOT}.annotated.vcf.gz

The SNPeff database.

SNPEFF = data/${ACC}/snpEffectPredictor.bin

Original read names.

R1=reads/${SRR}_1.fastq
R2=reads/${SRR}_2.fastq

Name of the quality trimmed read pairs.

Q1=reads/${SRR}_1P.fq
Q2=reads/${SRR}_2P.fq

The BWA index.

IDX=${REF}.amb

Phony targets instruct make to use these as rules only.

.PHONY: data vcf

Check that required executables exist.

EXE = bio bwa samtools bcftools snpEff trimmomatic
K := $(foreach exec,$(EXE),
$(if $(shell which $(exec)),ok,$(error "Program: $(exec) not found.")))

Print usage.

all:
@echo "USAGE:"
@echo
@echo " make vcf # generates a VCF file"
@echo
@echo " make clean # removes test data"
@echo " make realclean # removes all intermediate files"
@echo " make data # downloads the full dataset"
@echo ""
@echo "DEFAULTS:"
@echo ""
@echo " ACC=${ACC} SRR=${SRR} CPU=${CPU} LIMIT=${LIMIT}"
@echo ""

Generate the annotated VCF

vcf: ${ANN}

Removes all files created by operations in this file.

clean:
rm -rf reads/${SRR}* ${TMP}

Removes all files created by operations in this file.

realclean:
rm -rf refs reads ${TMP} vcf bam data tmp snpEff.config

Rule to make the GFF file.

${GFF}:
mkdir -p refs
bio fetch ${ACC} -format gff > ${GFF}

Rule to make the reference genome.

We stick the GFF here as a dependency so that it gets downloaded early on.

${REF}: ${GFF}
mkdir -p refs
bio fetch ${ACC} -format fasta > ${REF}
samtools faidx ${REF}

The index depends on the reference genome.

${IDX}: ${REF}
bwa index ${REF}
echo ${IDX}

Download a subset (test) of the SRA data.

${R1} ${R2}:
mkdir -p reads
fastq-dump -F -X ${LIMIT} --split-files -O reads ${SRR}

Download the full SRA run

data:
mkdir -p reads ${TMP}
fasterq-dump -e ${CPU} -t ${TMP} -f -x -p -O reads --split-files ${SRR}

Apply quality control to the reads.

${Q1} ${Q2}: ${R1} ${R2}

    # Will generate both adapters.
    echo ">illumina" > reads/adapter.fa
    echo "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" >> reads/adapter.fa
    echo ">nextera" >> reads/adapter.fa
    echo "CTGTCTCTTATACACATCTCCGAGCCCACGAGAC" >> reads/adapter.fa

    # Apply the trimming.
    trimmomatic PE -threads ${CPU} -phred33 -basein reads/${SRR}_1.fastq -baseout reads/${SRR}.fq\
            ILLUMINACLIP:reads/adapter.fa:2:30:5 SLIDINGWINDOW:4:15 MINLEN:50

BAM file depends on the BWA index and reads.

${BAM}: ${IDX} ${Q1} ${Q2}
mkdir -p bam
# Note how we filter alignment for mapped reads only.
bwa mem -t ${CPU} ${REF} ${Q1} ${Q2} | samtools view -b -F 4 | samtools sort -@ ${CPU} > ${BAM}
samtools index ${BAM}
samtools flagstat ${BAM}

The VCF file depends on the BAM file.

${VCF}: ${BAM}
mkdir -p vcf
bcftools mpileup -O v -f ${REF} ${BAM} | bcftools call --ploidy 1 -mv -O z -o ${VCF}
bcftools index ${VCF}
@echo -e "\n# VCF file: ${VCF}\n"

Building a custom snpEff database.

${SNPEFF}:
# Snpeff needs the files in specific folders.
mkdir -p data/${ACC}

    # Download the GenBank file, has to be called genes.gbk.
    bio fetch ${ACC} > data/${ACC}/genes.gbk

    # Append entry to current genome to the config.
    echo "${ACC}.genome : ${ACC}" >> snpEff.config

    # Build the snpEff database.
    snpEff build -v ${ACC}

Uses snpEff to annotate the variants.

${ANN}: ${VCF} ${SNPEFF}
snpEff -v ${ACC} ${VCF} | bcftools view -O z > ${ANN}
bcftools index ${ANN}
@mv snpEff_. vcf/
@echo -e "\n# Annotated VCF file: ${ANN}\n"

I apologize for the data dump. It's probably something silly, but I am really stuck.

ialbert · 2022-08-16T17:10:05Z

ialbert
Aug 16, 2022
Maintainer

It is a bit strange, but I will investigate one step has not completed it seems (others may have also failed).

In an activated environment do a

conda activate bioinfo
pip install bio --upgrade

1 reply

Cmbarrows Aug 16, 2022
Author

Thank you, that seemed to help. After upgrading I closed and re-opened the terminal and tried to run the makefile command again. I got the error message so I tried it a second time and instead of closing I just proceeded. The command "make" works now and shows the correct information displayed in the instructions, but now make vcf is showing an error.
I'll check the other discussion comments to see if there's an answer for this.

$ make vcf
mkdir -p refs
bio fetch AF086833 -format gff > refs/AF086833.gff
mkdir -p refs
bio fetch AF086833 -format fasta > refs/AF086833.fa
samtools faidx refs/AF086833.fa
samtools: error while loading shared libraries: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
make: *** [makefile:100: refs/AF086833.fa] Error 127

Thank you so much!

ialbert · 2022-08-16T17:45:19Z

ialbert
Aug 16, 2022
Maintainer

yes, I was worried about this a bit, I think the installation did not quite complete as originally intended

there are more problems there it seems

I would recommend starting new and reinstalling

This will delete the current install

curl http://data.biostarhandbook.com/uninstall.sh | bash

then please rerun the entire package. Or you could follow the section that is one step at a time

1 reply

Cmbarrows Aug 16, 2022
Author

Thank you! I had to go through the step by step section after multiple https0000 errors.
Finally got everything working and the make file spits out the correct information now.
Thanks again for your time and quick response!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Install Software - Run Realistic Analysis make VCF file error #228

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Install Software - Run Realistic Analysis make VCF file error #228

Uh oh!

Cmbarrows Aug 16, 2022

Installation complete

Close and reopen the terminal.

In the new terminal activate the bioinfo environment.

Make sure we run in bash.

Accession number for reference genome.

Accession number for the SRA run.

Number of compute cores.

Number of test reads used when running testdata.

GenBank reference file.

Fasta reference file.

GFF reference file.

Temporary directory used only with fasterq-dump.

See also: https://www.moritzs.de/linux/thank_you_sra/

All names will have this common element.

Alignment file.

Variant file.

The annotated VCF file.

The SNPeff database.

Original read names.

Name of the quality trimmed read pairs.

The BWA index.

Phony targets instruct make to use these as rules only.

Check that required executables exist.

Print usage.

Generate the annotated VCF

Removes all files created by operations in this file.

Removes all files created by operations in this file.

Rule to make the GFF file.

Rule to make the reference genome.

We stick the GFF here as a dependency so that it gets downloaded early on.

The index depends on the reference genome.

Download a subset (test) of the SRA data.

Download the full SRA run

Apply quality control to the reads.

BAM file depends on the BWA index and reads.

The VCF file depends on the BAM file.

Building a custom snpEff database.

Uses snpEff to annotate the variants.

Replies: 2 comments · 2 replies

Uh oh!

Uh oh!

ialbert Aug 16, 2022 Maintainer

Uh oh!

Cmbarrows Aug 16, 2022 Author

Uh oh!

ialbert Aug 16, 2022 Maintainer

Uh oh!

Cmbarrows Aug 16, 2022 Author

Cmbarrows
Aug 16, 2022

Replies: 2 comments 2 replies

ialbert
Aug 16, 2022
Maintainer

Cmbarrows Aug 16, 2022
Author

ialbert
Aug 16, 2022
Maintainer

Cmbarrows Aug 16, 2022
Author