-
Notifications
You must be signed in to change notification settings - Fork 2
Reference sequence databases: Private BOLD
MartenHoogeveen edited this page May 21, 2019
·
13 revisions
This page describes the method for creating a blast database of private bold data.
First of all, you need to download your sequences:
The headers should look like this:
>NLFLM009-12|Schoenoplectus lacustris|Magnoliophyta|Liliopsida|Poales|Cyperaceae|Schoenoplectus|Schoenoplectus lacustris
Download the gbif backbone data, this is for adding the kingdom later
wget http://rs.gbif.org/datasets/backbone/backbone-current.zip
unzip
unzip -j backbone-current.zip "Taxon.tsv"
extract only the necessary columns
awk -F "\t" '{print $18"\t"$19"\t"$20"\t"$21"\t"$22"\t"$23}' Taxon.tsv > gbif_taxonomy.tsv
execute script to add kingdom
python3 add_taxonomy_private_bold.py -i bold_naturalis_ITS.fas -g gbif_taxonomy.tsv -o bold_naturalis_ITS_taxonomy.fa
Create the database
makeblastdb -in bold_naturalis_ITS_taxonomy.fa -dbtype nucl -blastdb_version 5