Functional Analysis through Hidden Markov Models (FATHMM) is a precomputed dataset used by ICGC to do functional impact prediction on mutations.
FATHMM is a project with its own website which can be found here: http://fathmm.biocompute.org.uk/
The dataset is a MySQL database dump and is available here: http://fathmm.biocompute.org.uk/database/fathmm.v2.3.SQL.gz
However, since the DCC uses Postgres for relational storage, we have needed to convert the dump into PSQL compatabile format. This can be found here:
https://artifacts.oicr.on.ca/artifactory/dcc-binaries/fathmm-db/2.3/fathmm.v2.3.postgres.tar.gz
FATHMM is used to do functional impact prediction for mutations of missense_variant consequences. We feed the predictor a translation_id and mutation aa_change, and we get a result that is one of {TOLERATED, HIGH, UNKNOWN} as the output.
Import the dataset into Postgresql database
- Extract the database dump files from the gz archive
- Copy the appropriate dump files to the database server
- Run
psql fathmm < fathmm.v2.3.postgres.sql
, the dump file will be around 20GB uncompressed, expect the import to take 3-4 hours. - The SQL dump may have a few issues, we need to manually patch some of the tables and sequences before we can use FATHMM, see Patch section below.
If within OICR, A backup of FATHMM dump files and the patch can be found under
/nfs/backups/workspace/fathmm
DCC_CACHE
is a cache table that stores {translation_id, aa_change} -> prediction key value pairs. Before we run the actual prediction lookup, we will check if the result already exist in DCC_CACHE
first. Please note that when updating to another version of FATHMM dataset, DCC_CACHE
needs to be cleared to ensure data correctness.
The patch (also in the tarball) creates a DCC_CACHE
table and make sure the permissions are correct.
create index i1 on "DOMAINS" (id);
create index i2 on "DOMAINS" (hmm);
create table "DCC_CACHE" (
"translation_id" varchar(64) NOT NULL,
"aa_mutation" varchar(64) NOT NULL,
"score" varchar(16),
"prediction" varchar(16)
);
create index c1 on "DCC_CACHE" (translation_id, aa_mutation);
grant select, insert on "DOMAINS" to dcc;
grant select, insert on "LIBRARY" to dcc;
grant select, insert on "PHENOTYPES" to dcc;
grant select, insert on "PROBABILITIES" to dcc;
grant select, insert on "PROTEIN" to dcc;
grant select, insert on "SEQUENCE" to dcc;
grant select, insert on "VARIANTS" to dcc;
grant select, insert on "WEIGHTS" to dcc;
grant select, insert on "DCC_CACHE" to dcc;