-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
152 lines (119 loc) · 6.57 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please do not edit this file directly. -->
```{r, echo = FALSE, message=FALSE, warning=FALSE}
options(width=1000)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# svaRetro: R package for retrotransposed transcript detection from structural variants
<!-- badges: start -->
[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
<!-- badges: end -->
`svaRetro` contains functions for detecting retrotransposed transcripts (RTs) from structural variant calls.
It takes structural variant calls in GRanges of breakend notation and identifies RTs by exon-exon junctions and insertion sites.
The candidate RTs are reported by events and annotated with information of the inserted transcripts.
This package uses a breakend-centric event notation adopted from the
[`StructuralVariantAnnotation`](https://www.bioconductor.org/packages/release/bioc/html/StructuralVariantAnnotation.html) package.
More information about `VCF` objects and breakend-centric GRanges object can be found by consulting the vignettes in the
corresponding packages with `browseVignettes("VariantAnnotation")` and
`browseVignettes("StructuralVariantAnnotation")`.
# Installation
[svaNUMT](https://doi.org/doi:10.18129/B9.bioc.svaRetro) is currently available for download in Bioconductor (since BioC 3.14 & R 4.1):
```{r, eval=FALSE}
# install.packages("BiocManager")
BiocManager::install("svaRetro")
```
The development version can be installed from GitHub:
```{r, eval=FALSE}
BiocManager::install("PapenfussLab/svaRetro")
```
# How to cite
If you use svaRetro, please cite `svaRetro` [here](https://bioconductor.org/packages/svaRetro).
```
@article {Dong2021.08.18.456578,
author = {Dong, Ruining and Cameron, Daniel and Bedo, Justin and Papenfuss, Anthony T},
title = {svaRetro and svaNUMT: Modular packages for annotation of retrotransposed transcripts and nuclear integration of mitochondrial DNA in genome sequencing data},
elocation-id = {2021.08.18.456578},
year = {2021},
doi = {10.1101/2021.08.18.456578},
publisher = {Cold Spring Harbor Laboratory},
abstract = {Background The biological significance of structural variation is now more widely recognized. However, due to the lack of available tools for downstream analysis, including processing and annotating, interpretation of structural variant calls remains a challenge.Findings Here we present svaRetro and svaNUMT, R packages that provide functions for annotating novel genomic events such as non-reference retro-copied transcripts and nuclear integration of mitochondrial DNA. We evaluate the performance of these packages to detect events using simulations and public benchmarking datasets, and annotate processed transcripts in a public structural variant database.Conclusions svaRetro and svaNUMT provide efficient, modular tools for downstream identification and annotation of structural variant calls.Competing Interest StatementThe authors have declared no competing interest.SVstructural variantNUMTnuclear mitochondrial integrationRTretroposed transcriptTSDtarget site duplicationmtDNAmitochondrial DNA},
URL = {https://www.biorxiv.org/content/early/2021/08/19/2021.08.18.456578},
eprint = {https://www.biorxiv.org/content/early/2021/08/19/2021.08.18.456578.full.pdf},
journal = {bioRxiv}
}
```
# Workflow
Below is a workflow example for detecting RTs from a human SV callset.
This example is taken from the vignette of `svaRetro`.
```{r input, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(StructuralVariantAnnotation)
library(VariantAnnotation)
library(svaRetro)
RT_vcf <- readVcf(system.file("extdata", "diploidSV.vcf", package = "svaRetro"))
```
```{r}
RT_gr <- StructuralVariantAnnotation::breakpointRanges(RT_vcf, nominalPosition=TRUE)
head(RT_gr)
```
Note that `StructuralVariantAnnotation` requires the `GRanges` object to be composed entirely of valid breakpoints. Please consult the vignette of the `StructuralVariantAnnotation` package for ensuring breakpoint consistency.
### Identifying Retrotransposed Transcripts
The package provides `rtDetect` to identify RTs using the provided SV calls. This is achieved by detecting intronic deletions, which are breakpoints at exon-intron (and intron-exon) boundaries of a transcript. Fusions consisting of an exon boundary and a second genomic location are reported as potential insertion sites. Due to the complexity of RT events, insertion sites can be discovered on both left and right sides, only one side, or none at all.
```{r}
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(dplyr)
hg19.genes <- TxDb.Hsapiens.UCSC.hg19.knownGene
RT_vcf <- readVcf(system.file("extdata", "diploidSV.vcf", package = "svaRetro"))
RT_gr <- StructuralVariantAnnotation::breakpointRanges(RT_vcf, nominalPosition=TRUE)
RT <- rtDetect(RT_gr, hg19.genes, maxgap=50, minscore=0.3)
```
The output is a list of `GRanges` object consisting of two sets of `GRanges` calls, `insSite` and `junctions`, containing candidate insertion sites and exon-exon junctions respectively. Candidate insertion sites are annotated by the source transcripts and whether exon-exon junctions are detected for the source transcripts. RT junction breakends are annotated by the UCSC exon IDs, corresponding transcripts, and NCBI gene symbols.
```{r}
RT$SKA3
```
## Visualising breakpoint pairs via circos plots
One way of visualising RT is by circos plots. Here we use the package
[`circlize`](https://doi.org/10.1093/bioinformatics/btu393) to demonstrate
the visualisation of insertion site and exon-exon junctions.
To generate a simple circos plot of RT event with SKA3 transcript:
```{r, include=TRUE,results="hide",message=FALSE,warning=FALSE}
library(circlize)
rt_chr_prefix <- c(RT$SKA3$junctions, RT$SKA3$insSite)
seqlevelsStyle(rt_chr_prefix) <- "UCSC"
pairs <- breakpointgr2pairs(rt_chr_prefix)
pairs
```
To see supporting breakpoints clearly, we generate the circos plot according to the loci of event.
```{r}
circos.initializeWithIdeogram(
data.frame(V1=c("chr13", "chr11"),
V2=c(21720000,108585000),
V3=c(21755000,108586000),
V4=c("q12.11","q24.3"),
V5=c("gneg","gpos50")))
circos.genomicLink(as.data.frame(S4Vectors::first(pairs)), as.data.frame(S4Vectors::second(pairs)))
circos.clear()
```
<!-- # Citation
You can cite `svaNUMT` [here]()
```
@ARTICLE{svaNUMT,
title = "",
author = "",
journal = "",
volume = ,
number = ,
pages = ,
month = ,
year = ,
url = ,
doi = ,
pmc =
}
```
-->