Skip to content

Reference sequence databases: Private BOLD

MartenHoogeveen edited this page May 21, 2019 · 13 revisions

This page describes the method for creating a blast database of private bold data.

First of all, you need to download your sequences:

The headers should look like this:

>NLFLM009-12|Schoenoplectus lacustris|Magnoliophyta|Liliopsida|Poales|Cyperaceae|Schoenoplectus|Schoenoplectus lacustris

Download the gbif backbone data, this is for adding the kingdom later

wget http://rs.gbif.org/datasets/backbone/backbone-current.zip

unzip

unzip -j backbone-current.zip "Taxon.tsv"

extract only the necessary columns

awk -F "\t" '{print $18"\t"$19"\t"$20"\t"$21"\t"$22"\t"$23}' Taxon.tsv > gbif_taxonomy.tsv

execute script to add kingdom

python3 add_taxonomy_private_bold.py -i bold_naturalis_ITS.fas -g gbif_taxonomy.tsv -o bold_naturalis_ITS_taxonomy.fa

Create the database

makeblastdb -in bold_naturalis_ITS_taxonomy.fa -dbtype nucl -blastdb_version 5