Skip to content

fchen365/surf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Fan Chen
Jun 14, 2021
b77fd94 · Jun 14, 2021

History

27 Commits
Apr 10, 2021
Mar 23, 2021
Apr 10, 2021
Dec 6, 2019
May 6, 2021
Dec 19, 2020
Jun 14, 2021
May 7, 2021
Dec 6, 2019
Apr 10, 2021
Apr 10, 2021
May 6, 2021
Jun 14, 2021

Repository files navigation

SURF

lifecycle

The Statistical Utility for RBP Functions (SURF) is an integrative analysis framework to identify alternative splicing (AS), alternative transcription initiation (ATI), and alternative polyadenylation (APA) events regulated by individual RBPs and elucidate protein-RNA interactions governing these events. We used SURF to analyzed 104 RBP data (K562 cells, available from ENCODE).

A detailed vignette is available here.

Installation

You can install the development version of surf from GitHub with:

# install.packages("devtools")
devtools::install_github("fchen365/surf")

What can you do with SURF?

SURF is versatile in handling ATR event-centric analysis. Provided the data, here are four different things you could do with SURF.

Data Format Task
1 genome annotation any (gtf, gff, …) parse ATR events
2 + RNA-seq alignment (bam) detect differential ATR events
3 + CLIP-seq alignment (bam) detect functional association
4 + external RNA-seq summarized table differential transcriptional activity

SURF Pipeline

— One task at one call

The four tasks of SURF pipeline should be streamlined. Once you have the data in hand (see the following sub-section), each step can be performed with a single function:

library(surf)

event <- parseEvent(anno_file)                              # task 1
drr <- drseq(event, rna_seq_sample)                         # task 2
far <- faseq(drr, clip_seq_sample)                          # task 3
dar <- daseq(far, getRankings(exprMat), ext_sample)         # task 4

Here, anno_file, rna_seq_sample, clip_seq_sample, and ext_sample are data description, and exprMat is a table of extra transcriptome quantification (e.g., TCGA, GTEx, …).

— Tell surf about your data

Describing your data should be easy. Simply follow the example below.

For task 1, a file directory will do.

anno_file <- "gencode.v24.annotation.filtered.gtf"

For task 2, surf needs to know where the alignment files (bam) are and the experimental condition for differential analysis (e.g., RBP “knock-down” and “wild-type” control).

rna_seq_sample <- data.frame(
  row.names = c('sample1', 'sample2', 'sample3', 'sample4'),
  bam = paste0("rna-seq/bam/sample", 1:4, ".bam"),
  condition = c('knock-down', 'knock-down', 'wild-type', 'wild-type'),
  stringsAsFactors = F
) 

Similarly for task 3, surf needs to know where the alignment files (bam) are and the experimental condition (e.g., “IP” and the input control “SMI”).

rna_seq_sample <- data.frame(
  row.names = c('sample5', 'sample6', 'sample7'),
  bam = paste0('clip-seq/bam/', 5:7, '.bam'),
  condition = c('IP', 'IP', 'SMI'),
  stringsAsFactors = F
)

Finally, for task 4, surf assumes that you have transcriptome quantification summarized in a table exprMat, whose rows correspond to genomic features (e.g., genes, transcripts, …) and columns correspond to samples. You can use any your favorite measure (e.g. TPM, RPKM, …). Then, let surf know of the sample group (condition):

ext_sample <- data.frame(
  row.names = colnames(exprMat),
  condition = rep(c('TCGA', 'GTEx'), c(173, 337))
)

Reference

Chen, F., Keleş, S. SURF: integrative analysis of a compendium of RNA-seq and CLIP-seq datasets highlights complex governing of alternative transcriptional regulation by RNA-binding proteins. Genome Biol 21, 139 (2020). doi:10.1186/s13059-020-02039-7

About

The statistical utility for RBP functions (SURF)

Topics

Resources

License

Citation

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages