Skip to content
Lukas Mueller edited this page Sep 19, 2023 · 8 revisions

BLAST Databases

BLAST database location

On the BTI servers, most BLAST databases are stored on bergamot at /data/data1/prod/blast/databases/current.

In dockers, this is usually mounted into /home/production/blast.

Formatting BLAST databases

To format a BLAST database, use makeblastdb as follows:

makeblastdb -in database.fasta -dbtype [prot|nucl] -parse_seqids

The -parse_seqids option is required to retrieve sequences from the database file, and is used by the on-line BLAST system to display matches.

IMPORTANT

Only use makeblastdb to format databases. formatdb is no longer supported.

Troubleshooting

If you see the error message:

Error: mdb_env_open: No locks available

it means that your NSF server lacks the statd package. Install it on the NFS host system using:

sudo systemctl enable rpc-statd  # Enable statd on boot
sudo systemctl start rpc-statd  # Start statd for the current session

Adding BLAST databases to the database using a script

You can use the script load_blast.pl in the sgn repo.

You will need to prepare an Excel formatted file with the information to load into the database. See perldoc bin/load_blast.pl in the sgn repo.

Adding BLAST databases to the database manually

Now we have to connect to the SGN database and insert the metadata for the new BLAST databases.

Connect to postgres

psql -h localhost -U postgres

Connect to the database

\c cxgn

Lets take a look to the tables we are going to work with to see some examples

\d sgn.blast_db
select * from sgn.blast_db;
select * from sgn.blast_db_group;
select * from sgn. blast_db_blast_db_group;

Find the blast_db_group where your database will be included and get the blast_db_group_id for future operations.

cxgn=# select * from sgn.blast_db_group order by ordinal;

 blast_db_group_id |                   name                    | ordinal 
-------------------+-------------------------------------------+---------
                18 | Popular datasets                          |      10
                28 | Tomato Genome (Current version)           |      20
                10 | Potato Genome (Current version)           |      30
                21 | Pepper Genome (Current version)           |      40
                29 | Eggplant Genome (Current version)         |      50
                32 | Petunia Sps. Genomes (Current version)    |      60
                 9 | N.benthamiana Genome (Current version)    |      70
                22 | N.tabacum Genomes (Current version)       |      80
                23 | Other Nicotiana Genomes (Current version) |      90
                12 | Tomato Wild Species                       |     100
                30 | Coffee Genome                             |     115
                19 | Tomato Inbred Lines                       |     120
                31 | Genome Sequences                          |     125
                 5 | Markers                                   |     130
                 6 | Organelle Genomes                         |     135
                 1 | Tomato Genome (other datasets)            |     140
                27 | N.benthamiana Genomes (Previous version)  |     145
                26 | N.tabacum Genomes (Previous version)      |     146
                17 | Transcriptome Projects                    |     155
                20 | Proteome Projects                         |     160
                 8 | Combined Sets                             |     170
                 3 | SGN Unigenes (current version)            |     180
                11 | SGN Unigenes (previous versions)          |     185
                 2 | SGN ESTs                                  |     200
                13 | Gene Family Sets                          |     210
                 7 | NCBI Sets                                 |     220
                 4 | Arabidopsis (TAIR)                        |     230

Lets insert the information about the BLAST db on the SGN database. It is stored on the sgn schema, on the table blast_db. Remember to use the right blast_db_group_id for your data

BEGIN;

INSERT INTO sgn.blast_db (file_base,title,type,source_url,update_freq,index_seqs,blast_db_group_id,web_interface_visible,description) VALUES ('tomato_genome/my_blast_db_file_name', 'my blast db name', 'nucleotide', 'ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/my_blast_db_file_name','manual','t','28','t','description of the dataset');

COMMIT;

If your blast_db belongs to a new group you can create it, adding an ordinal to sort the list of blast_db_groups on the BLAST interface on the website

BEGIN;

INSERT INTO sgn.blast_db_group (name,ordinal) VALUES ('my blast db group',110);

COMMIT;

Link the blast_db and the blast_db_group tables

BEGIN;

INSERT INTO sgn.blast_db_blast_db_group (blast_db_id,blast_db_group_id) VALUES (333,28);

COMMIT;

If you are interested in creating a link to [wiki:SetUpJbrowseBlastLink connect BLAST output with JBrowse] you will need to insert the information for jbrowse_src on sgn.blast_db. For more information follow the next [wiki:SetUpJbrowseBlastLink link]