Subtype Classification of Tumors and Derived Lab Grown Models

Molecular subtyping using the TMP Toolkit

Quickstart Guide

Setup

Install requirements - detailed instructions are found on the Requirements page:

Install Python 3+
Install GDC Data Transfer Tool Client

Ensure that steps are completed on the Requirements page - (includes creating working environment, signining in, and manually downloading required data)

Download Data from Manifest File Using the GDC Client

Download Gene Expression Data

bash scripts/gdc_download.sh PAAD

This will create subfolders in data-raw/GEXP` and place GDC molecular matrices here.

Options for cancer cohort includes ALL, BLCA, BRCA, COADREAD, ESO, HNSC, KID, LGGGBM, LIHCCHOL, LUNG, OV, PAAD, SARC, SKCM, UCEC

For more details on each cancer cohort option see Cohort Options Page

Run Processing Pipeline

Example shown for running PAAD cohort

bash scripts/process.sh PAAD data/prep

Creates file data/prep/<CANCER>_GEXP/<CANCER>_GEXP_prep2_<TYPE>.tsv that is prepped for distance calculations

Options for cancer cohort includes ALL, BLCA, BRCA, COADREAD, ESO, HNSC, KID, LGGGBM, LIHCCHOL, LUNG, OV, PAAD, SARC, SKCM, UCEC

For more details on each cancer cohort option see Cohort Options Page

Sample Subtype Classification Using Gene Expression Data

The goal of this analysis is to get cancer subtype predictions for HCMI samples (organoids, cell cultures , xenografts, etc). To accomplish this we will use the top performing pre-trained machine learning models (dockerized TMP models that were trained using TCGA data that has been pre-proccessed). Specifically we are interested in using gene expression from the HCMI samples and eventually compare primary tumors to their corresponding models (organoids, cell cultures , xenografts, etc).

The TMP models (pre-trained models) are specific to TCGA cancer cohorts (TCGA abbreviations), therefore we will split HCMI data into TCGA cancer cohorts(based on sample metadata).

Run gene expression classifier pipeline:

# where specify cancer, tumor-file, model-file, transformed-dir
bash scripts/run_classify_GEXP.sh \
    PAAD \
    data/prep/PAAD_GEXP/PAAD_GEXP_prep2_Tumor.tsv \
    data/prep/PAAD_GEXP/PAAD_GEXP_prep2_Model.tsv \
    data/classifier_gexp/ml_ready_qrank

Results can found in data/classifier_gexp/ml_predictions_qrank/combo/HCMI_TMPsubtype_qRank_<CANCER>.tsv

Note: LUNG (includes LUAD and LUSC), ESO (includes GEA and ESCC) during transformation and classification, then is merged in post-classification summary

Sample Subtype Classification Using DNA Methylation Data

The goal of this analysis is to get cancer subtype predictions for HCMI samples (organoids, cell cultures , xenografts, etc). To accomplish this we will use the top performing pre-trained machine learning models (dockerized TMP models that were trained using TCGA data that has been pre-proccessed). Specifically we are interested in using gene expression from the HCMI samples and eventually compare primary tumors to their corresponding models (organoids, cell cultures , xenografts, etc).

The TMP models (pre-trained models) are specific to TCGA cancer cohorts (TCGA abbreviations), therefore we will split HCMI data into TCGA cancer cohorts(based on sample metadata).

Run DNA methylation classifier pipeline:

# where specify cancer, tumor-file, model-file, transformed-dir
bash scripts/run_classify_METHYL.sh \
    SKCM \
    data/classifier_methyl/processed/20231211_HCMI_TMP_subtype_prediction_feature_matrix_SKCM.tsv

Results can found in data/classifier_methyl/ml_predictions/combo/HCMI_METH_TMPsubtypes.<CANCER>.tsv

Note: LUNG (includes LUAD and LUSC), ESO (includes GEA and ESCC) during transformation and classification, then is merged in post-classification summary

Second Example for Combination Cohort

bash scripts/run_classify_METHYL.sh \
    LUNG \
    data/classifier_methyl/processed/20231211_HCMI_TMP_subtype_prediction_feature_matrix_LUNG.tsv

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data-raw		data-raw
data		data
doc		doc
gdan-tmp-models @ bf0a240		gdan-tmp-models @ bf0a240
scripts		scripts
secrets		secrets
src		src
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Subtype Classification of Tumors and Derived Lab Grown Models

Molecular subtyping using the TMP Toolkit

Table of contents

Quickstart Guide

Setup

Download Data from Manifest File Using the GDC Client

Run Processing Pipeline

Sample Subtype Classification Using Gene Expression Data

Sample Subtype Classification Using DNA Methylation Data

About

Releases

Packages

Languages

jordan2lee/classify-lab-models-and-tumors

Folders and files

Latest commit

History

Repository files navigation

Subtype Classification of Tumors and Derived Lab Grown Models

Molecular subtyping using the TMP Toolkit

Table of contents

Quickstart Guide

Setup

Download Data from Manifest File Using the GDC Client

Run Processing Pipeline

Sample Subtype Classification Using Gene Expression Data

Sample Subtype Classification Using DNA Methylation Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages