Replies: 2 comments 2 replies
-
It is a bit strange, but I will investigate one step has not completed it seems (others may have also failed). In an activated environment do a
|
Beta Was this translation helpful? Give feedback.
1 reply
-
yes, I was worried about this a bit, I think the installation did not quite complete as originally intended there are more problems there it seems I would recommend starting new and reinstalling This will delete the current install
then please rerun the entire package. Or you could follow the section that is one step at a time |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I apologize if this is a redundant question, I can't seem to figure this out based on the ongoing FASTQ topic thread.
I'm making my way through a fresh install and am on part 3. I've created and set my work directory but then this happens.
$ conda activate bioinfo
(bioinfo)
$ curl -s http://data.biostarhandbook.com/make/snpcall.mk > Makefile
(bioinfo)
$ make
makefile:61: *** "Program: bio not found.". Stop.
(bioinfo)
I get this error "Program: bio not found." by trying the command "make vcf" as well.
I tried to install the sratools to see if that makes it work.
$ curl http://data.biostarhandbook.com/sratools.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 898 100 898 0 0 4513 0 --:--:-- --:--:-- --:--:-- 4535
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 81.8M 100 81.8M 0 0 10.7M 0 0:00:07 0:00:07 --:--:-- 12.2M
Installation complete
Close and reopen the terminal.
In the new terminal activate the bioinfo environment.
(bioinfo)
So I do that and run the prompt again but I still get the same error message.
If I remove the "Makefile command" I get this. $ curl -s http://data.biostarhandbook.com/make/snpcall.mk
Make sure we run in bash.
SHELL = bash
Accession number for reference genome.
ACC ?= AF086833
Accession number for the SRA run.
SRR ?= SRR1553425
Number of compute cores.
CPU ?= 4
Number of test reads used when running testdata.
LIMIT ?=10000
GenBank reference file.
GBK=${ACC}.gbk
Fasta reference file.
REF=refs/${ACC}.fa
GFF reference file.
GFF=refs/${ACC}.gff
Temporary directory used only with fasterq-dump.
See also: https://www.moritzs.de/linux/thank_you_sra/
TMP=tmp
All names will have this common element.
ROOT=${SRR}-${ACC}
Alignment file.
BAM=bam/${ROOT}.bam
Variant file.
VCF=vcf/${ROOT}.vcf.gz
The annotated VCF file.
ANN=vcf/${ROOT}.annotated.vcf.gz
The SNPeff database.
SNPEFF = data/${ACC}/snpEffectPredictor.bin
Original read names.
R1=reads/${SRR}_1.fastq
R2=reads/${SRR}_2.fastq
Name of the quality trimmed read pairs.
Q1=reads/${SRR}_1P.fq
Q2=reads/${SRR}_2P.fq
The BWA index.
IDX=${REF}.amb
Phony targets instruct make to use these as rules only.
.PHONY: data vcf
Check that required executables exist.
EXE = bio bwa samtools bcftools snpEff trimmomatic$(foreach exec,$ (EXE),
$(if $ (shell which $(exec)),ok,$ (error "Program: $(exec) not found.")))
K :=
Print usage.
all:
@echo "USAGE:"
@echo
@echo " make vcf # generates a VCF file"
@echo
@echo " make clean # removes test data"
@echo " make realclean # removes all intermediate files"
@echo " make data # downloads the full dataset"
@echo ""
@echo "DEFAULTS:"
@echo ""
@echo " ACC=${ACC} SRR=${SRR} CPU=${CPU} LIMIT=${LIMIT}"
@echo ""
Generate the annotated VCF
vcf: ${ANN}
Removes all files created by operations in this file.
clean:
rm -rf reads/${SRR}* ${TMP}
Removes all files created by operations in this file.
realclean:
rm -rf refs reads ${TMP} vcf bam data tmp snpEff.config
Rule to make the GFF file.
${GFF}:
mkdir -p refs
bio fetch ${ACC} -format gff > ${GFF}
Rule to make the reference genome.
We stick the GFF here as a dependency so that it gets downloaded early on.
${REF}: ${GFF}
mkdir -p refs
bio fetch ${ACC} -format fasta > ${REF}
samtools faidx ${REF}
The index depends on the reference genome.
${IDX}: ${REF}
bwa index ${REF}
echo ${IDX}
Download a subset (test) of the SRA data.
${R1} ${R2}:
mkdir -p reads
fastq-dump -F -X ${LIMIT} --split-files -O reads ${SRR}
Download the full SRA run
data:
mkdir -p reads ${TMP}
fasterq-dump -e ${CPU} -t ${TMP} -f -x -p -O reads --split-files ${SRR}
Apply quality control to the reads.
${Q1} ${Q2}: ${R1} ${R2}
BAM file depends on the BWA index and reads.
${BAM}: ${IDX} ${Q1} ${Q2}
mkdir -p bam
# Note how we filter alignment for mapped reads only.
bwa mem -t ${CPU} ${REF} ${Q1} ${Q2} | samtools view -b -F 4 | samtools sort -@ ${CPU} > ${BAM}
samtools index ${BAM}
samtools flagstat ${BAM}
The VCF file depends on the BAM file.
${VCF}: ${BAM}
mkdir -p vcf
bcftools mpileup -O v -f ${REF} ${BAM} | bcftools call --ploidy 1 -mv -O z -o ${VCF}
bcftools index ${VCF}
@echo -e "\n# VCF file: ${VCF}\n"
Building a custom snpEff database.
${SNPEFF}:
# Snpeff needs the files in specific folders.
mkdir -p data/${ACC}
Uses snpEff to annotate the variants.
${ANN}: ${VCF} ${SNPEFF}
snpEff -v ${ACC} ${VCF} | bcftools view -O z > ${ANN}
bcftools index ${ANN}
@mv snpEff_. vcf/
@echo -e "\n# Annotated VCF file: ${ANN}\n"
I apologize for the data dump. It's probably something silly, but I am really stuck.
Beta Was this translation helpful? Give feedback.
All reactions