Replies: 2 comments 2 replies
-
It is a bit strange, but I will investigate one step has not completed it seems (others may have also failed). In an activated environment do a
|
Beta Was this translation helpful? Give feedback.
1 reply
-
yes, I was worried about this a bit, I think the installation did not quite complete as originally intended there are more problems there it seems I would recommend starting new and reinstalling This will delete the current install
then please rerun the entire package. Or you could follow the section that is one step at a time |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I apologize if this is a redundant question, I can't seem to figure this out based on the ongoing FASTQ topic thread.
I'm making my way through a fresh install and am on part 3. I've created and set my work directory but then this happens.
$ conda activate bioinfo
(bioinfo)
$ curl -s http://data.biostarhandbook.com/make/snpcall.mk > Makefile
(bioinfo)
$ make
makefile:61: *** "Program: bio not found.". Stop.
(bioinfo)
I get this error "Program: bio not found." by trying the command "make vcf" as well.
I tried to install the sratools to see if that makes it work.
$ curl http://data.biostarhandbook.com/sratools.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 898 100 898 0 0 4513 0 --:--:-- --:--:-- --:--:-- 4535
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 81.8M 100 81.8M 0 0 10.7M 0 0:00:07 0:00:07 --:--:-- 12.2M
Installation complete
Close and reopen the terminal.
In the new terminal activate the bioinfo environment.
(bioinfo)
So I do that and run the prompt again but I still get the same error message.
If I remove the "Makefile command" I get this. $ curl -s http://data.biostarhandbook.com/make/snpcall.mk
Make sure we run in bash.
SHELL = bash
Accession number for reference genome.
ACC ?= AF086833
Accession number for the SRA run.
SRR ?= SRR1553425
Number of compute cores.
CPU ?= 4
Number of test reads used when running testdata.
LIMIT ?=10000
GenBank reference file.
GBK=${ACC}.gbk
Fasta reference file.
REF=refs/${ACC}.fa
GFF reference file.
GFF=refs/${ACC}.gff
Temporary directory used only with fasterq-dump.
See also: https://www.moritzs.de/linux/thank_you_sra/
TMP=tmp
All names will have this common element.
ROOT=${SRR}-${ACC}
Alignment file.
BAM=bam/${ROOT}.bam
Variant file.
VCF=vcf/${ROOT}.vcf.gz
The annotated VCF file.
ANN=vcf/${ROOT}.annotated.vcf.gz
The SNPeff database.
SNPEFF = data/${ACC}/snpEffectPredictor.bin
Original read names.
R1=reads/${SRR}_1.fastq
R2=reads/${SRR}_2.fastq
Name of the quality trimmed read pairs.
Q1=reads/${SRR}_1P.fq
Q2=reads/${SRR}_2P.fq
The BWA index.
IDX=${REF}.amb
Phony targets instruct make to use these as rules only.
.PHONY: data vcf
Check that required executables exist.
EXE = bio bwa samtools bcftools snpEff trimmomatic$(foreach exec,$ (EXE),
$(if $ (shell which $(exec)),ok,$ (error "Program: $(exec) not found.")))
K :=
Print usage.
all:
@echo "USAGE:"
@echo
@echo " make vcf # generates a VCF file"
@echo
@echo " make clean # removes test data"
@echo " make realclean # removes all intermediate files"
@echo " make data # downloads the full dataset"
@echo ""
@echo "DEFAULTS:"
@echo ""
@echo " ACC=${ACC} SRR=${SRR} CPU=${CPU} LIMIT=${LIMIT}"
@echo ""
Generate the annotated VCF
vcf: ${ANN}
Removes all files created by operations in this file.
clean:
rm -rf reads/${SRR}* ${TMP}
Removes all files created by operations in this file.
realclean:
rm -rf refs reads ${TMP} vcf bam data tmp snpEff.config
Rule to make the GFF file.
${GFF}:
mkdir -p refs
bio fetch ${ACC} -format gff > ${GFF}
Rule to make the reference genome.
We stick the GFF here as a dependency so that it gets downloaded early on.
${REF}: ${GFF}
mkdir -p refs
bio fetch ${ACC} -format fasta > ${REF}
samtools faidx ${REF}
The index depends on the reference genome.
${IDX}: ${REF}
bwa index ${REF}
echo ${IDX}
Download a subset (test) of the SRA data.
${R1} ${R2}:
mkdir -p reads
fastq-dump -F -X ${LIMIT} --split-files -O reads ${SRR}
Download the full SRA run
data:
mkdir -p reads ${TMP}
fasterq-dump -e ${CPU} -t ${TMP} -f -x -p -O reads --split-files ${SRR}
Apply quality control to the reads.
${Q1} ${Q2}: ${R1} ${R2}
BAM file depends on the BWA index and reads.
${BAM}: ${IDX} ${Q1} ${Q2}
mkdir -p bam
# Note how we filter alignment for mapped reads only.
bwa mem -t ${CPU} ${REF} ${Q1} ${Q2} | samtools view -b -F 4 | samtools sort -@ ${CPU} > ${BAM}
samtools index ${BAM}
samtools flagstat ${BAM}
The VCF file depends on the BAM file.
${VCF}: ${BAM}
mkdir -p vcf
bcftools mpileup -O v -f ${REF} ${BAM} | bcftools call --ploidy 1 -mv -O z -o ${VCF}
bcftools index ${VCF}
@echo -e "\n# VCF file: ${VCF}\n"
Building a custom snpEff database.
${SNPEFF}:
# Snpeff needs the files in specific folders.
mkdir -p data/${ACC}
Uses snpEff to annotate the variants.
${ANN}: ${VCF} ${SNPEFF}
snpEff -v ${ACC} ${VCF} | bcftools view -O z > ${ANN}
bcftools index ${ANN}
@mv snpEff_. vcf/
@echo -e "\n# Annotated VCF file: ${ANN}\n"
I apologize for the data dump. It's probably something silly, but I am really stuck.
Beta Was this translation helpful? Give feedback.
All reactions