Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for additional VEP terms #926

Open
jxchong opened this issue Mar 20, 2019 · 6 comments
Open

support for additional VEP terms #926

jxchong opened this issue Mar 20, 2019 · 6 comments

Comments

@jxchong
Copy link
Contributor

jxchong commented Mar 20, 2019

Based on the findings of the DDD paper, we would like to be able to filter for the following variant annotations created by the VEP SpliceRegion plugin

splice_donor_5th_base_variant
splice_donor_region_variant
splice_polypyrimidine_tract_variant
extended_intronic_splice_region_variant_5prime
extended_intronic_splice_region_variant_3prime

Info here: http://www.ensembl.info/2018/10/26/cool-stuff-the-vep-can-do-splice-site-variant-annotation/
Plugin here: https://github.com/Ensembl/VEP_plugins/blob/release/94/SpliceRegion.pm

None of these annotations are currently listed in GEMINI's impacts column. How would we be able to access them when they don't have their own custom vep_xxx column (my understanding is that they are just provided by VEP as the annotation)? (right now we just do impact_severity<>'LOW' in GEMINI so I imagine we would have to do impact_severity<>'LOW' or xxxxx='yyy' or ...)

@arq5x
Copy link
Owner

arq5x commented Mar 21, 2019

I honestly think this is the realm of the new gemini workflow based upon vcfanno and vcf2db. Our goal is the switch over to this entirely this year.

@jxchong
Copy link
Contributor Author

jxchong commented Mar 21, 2019

Thanks Aaron. If we switch to vcfanno/vcf2b right now, would these be accessible to us in queries?

@arq5x
Copy link
Owner

arq5x commented Mar 21, 2019

If they are in the VCF via vcfanno or VEP, they make it into the database. @brentp - can you corroborate?

@brentp
Copy link
Collaborator

brentp commented Mar 21, 2019

I think these would be impacts in the CSQ string, right? e.g. instead of splice_variant it would now be splice_donor_5th_base_variant so we'd have to update the geneimpacts module.

An example VCF with a few variants would be helpful.

@jxchong
Copy link
Contributor Author

jxchong commented Mar 27, 2019

Ok, we finally got this working in VEP and these show up in the CSQ string, but not in the Consequence field. They are instead in the SpliceRegionOutput field.

Here's an example. More examples in the VCF available here:
https://www.dropbox.com/s/mg7u3nkxil7p4h5/spliceregionexamples.vcf.gz?dl=0

1    38272660    rs2291297    G    A    42583.1    PASS    AC=1;AF=0.224;AN=2;BaseQRankSum=-1.622;ClippingRankSum=0.271;DB;DP=3988;ExcessHet=0.4621;FS=0.528;InbreedingCoeff=0.1
309;MLEAC=43;MLEAF=0.224;MQ=9.49;MQ0=0;MQRankSum=0;QD=19.89;ReadPosRankSum=0.463;SOR=0.637;CSQ=A|downstream_gene_variant|MODIFIER|MTF1|ENSG00000188786|Transcript|ENST00000373036|protein_coding||||||||||rs2291297|2579|-1||HGNC|7428|YES|CCDS30676.1|1|C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000373042|protein_coding|||||||||
|rs2291297|1158|1||HGNC|24789|YES|CCDS427.2||C1orf122||||||||||||,A|5_prime_UTR_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000373043|protein_coding|1/2||ENST00000373043.1:c.
-1697G>A||10/2229|||||rs2291297||1||HGNC|24789||CCDS44112.1||C1orf122||||||||||||,A|intron_variant|MODIFIER|YRDC|ENSG00000196449|Transcript|ENST00000373044|protein_coding||2/4|ENST00000373044.2:c.505-12C>T|||||||rs2291297||-1||HGNC|28905|YES|CCDS30675.1||C1orf122||||||||||||splice_polypyrimidine_tract_variant,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcrip
t|ENST00000419397|processed_transcript||||||||||rs2291297|672|1||HGNC|24789||||C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000446260|prot
ein_coding||||||||||rs2291297|1422|1||HGNC|24789||||C1orf122||||||||||||,A|upstream_gene_variant|MODIFIER|C1orf122|ENSG00000197982|Transcript|ENST00000468084|protein_coding||||||||||rs22912
97|759|1||HGNC|24789||CCDS44112.1||C1orf122||||||||||||,A|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000004891|promoter||||||||||rs2291297|||||||||C1orf122||||||||||||    GT:AD:DP:GQ:PL    0/1:37,27:.:99:771,0,945

@arq5x
Copy link
Owner

arq5x commented Mar 28, 2019

Gotcha, looks like we would need to update the logic in geneimpacts and in vcf2db to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants