-
Notifications
You must be signed in to change notification settings - Fork 36
Blast
On the BTI servers, most BLAST databases are stored on
bergamot
at/data/data1/prod/blast/databases/current
.
In dockers, this is usually mounted into /home/production/blast
.
To format a BLAST database, use makeblastdb
as follows:
makeblastdb -in database.fasta -dbtype [prot|nucl] -parse_seqids
The -parse_seqids
option is required to retrieve sequences from the database file, and is used by the on-line BLAST system to display matches.
IMPORTANT
Only use makeblastdb to format databases. formatdb is no longer supported.
If you see the error message:
Error: mdb_env_open: No locks available
it means that your NSF server lacks the statd
package. Install it on the NFS host system using:
sudo systemctl enable rpc-statd # Enable statd on boot
sudo systemctl start rpc-statd # Start statd for the current session
You can use the script load_blast.pl
in the sgn
repo.
You will need to prepare an Excel formatted file with the information to load into the database. See perldoc bin/load_blast.pl
in the sgn
repo.
Now we have to connect to the SGN database and insert the metadata for the new BLAST databases.
Connect to postgres
psql -h localhost -U postgres
Connect to the database
\c cxgn
Lets take a look to the tables we are going to work with to see some examples
\d sgn.blast_db
select * from sgn.blast_db;
select * from sgn.blast_db_group;
select * from sgn. blast_db_blast_db_group;
Find the blast_db_group where your database will be included and get the blast_db_group_id for future operations.
cxgn=# select * from sgn.blast_db_group order by ordinal;
blast_db_group_id | name | ordinal
-------------------+-------------------------------------------+---------
18 | Popular datasets | 10
28 | Tomato Genome (Current version) | 20
10 | Potato Genome (Current version) | 30
21 | Pepper Genome (Current version) | 40
29 | Eggplant Genome (Current version) | 50
32 | Petunia Sps. Genomes (Current version) | 60
9 | N.benthamiana Genome (Current version) | 70
22 | N.tabacum Genomes (Current version) | 80
23 | Other Nicotiana Genomes (Current version) | 90
12 | Tomato Wild Species | 100
30 | Coffee Genome | 115
19 | Tomato Inbred Lines | 120
31 | Genome Sequences | 125
5 | Markers | 130
6 | Organelle Genomes | 135
1 | Tomato Genome (other datasets) | 140
27 | N.benthamiana Genomes (Previous version) | 145
26 | N.tabacum Genomes (Previous version) | 146
17 | Transcriptome Projects | 155
20 | Proteome Projects | 160
8 | Combined Sets | 170
3 | SGN Unigenes (current version) | 180
11 | SGN Unigenes (previous versions) | 185
2 | SGN ESTs | 200
13 | Gene Family Sets | 210
7 | NCBI Sets | 220
4 | Arabidopsis (TAIR) | 230
Lets insert the information about the BLAST db on the SGN database. It is stored on the sgn schema, on the table blast_db. Remember to use the right blast_db_group_id for your data
BEGIN;
INSERT INTO sgn.blast_db (file_base,title,type,source_url,update_freq,index_seqs,blast_db_group_id,web_interface_visible,description) VALUES ('tomato_genome/my_blast_db_file_name', 'my blast db name', 'nucleotide', 'ftp://ftp.solgenomics.net/genomes/Solanum_lycopersicum/my_blast_db_file_name','manual','t','28','t','description of the dataset');
COMMIT;
If your blast_db
belongs to a new group you can create it, adding an ordinal to sort the list of blast_db_groups
on the BLAST interface on the website
BEGIN;
INSERT INTO sgn.blast_db_group (name,ordinal) VALUES ('my blast db group',110);
COMMIT;
Link the blast_db and the blast_db_group tables
BEGIN;
INSERT INTO sgn.blast_db_blast_db_group (blast_db_id,blast_db_group_id) VALUES (333,28);
COMMIT;
If you are interested in creating a link to [wiki:SetUpJbrowseBlastLink connect BLAST output with JBrowse] you will need to insert the information for jbrowse_src on sgn.blast_db. For more information follow the next [wiki:SetUpJbrowseBlastLink link]