Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.1.0 #2

Merged
merged 43 commits into from
Jan 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
6f154ce
set up the cli base
nvnieuwk Nov 30, 2023
83813f0
read plain and bgzip files
nvnieuwk Dec 5, 2023
5332a7b
add vcf structs + header conversion
nvnieuwk Dec 5, 2023
7f05aea
small optimization to bgzip reading
nvnieuwk Dec 5, 2023
7948aca
Save the samples in the header
nvnieuwk Dec 5, 2023
e9a3b36
add variant reading (no format yet)
nvnieuwk Dec 11, 2023
5c073ca
add format parsing
nvnieuwk Dec 11, 2023
79efe9f
add config reading
nvnieuwk Dec 12, 2023
7a131c6
add comments, missing config fields and start standardizing
nvnieuwk Dec 12, 2023
94f300f
change output logger
nvnieuwk Dec 12, 2023
a4f1424
print full header
nvnieuwk Dec 12, 2023
3cfa1bc
add standard info field resolving
nvnieuwk Dec 12, 2023
d187455
fix an issue with the type missing a capital letter
nvnieuwk Dec 12, 2023
b6f429f
finish VCF writing
nvnieuwk Dec 13, 2023
aba2e23
add a min function
nvnieuwk Dec 13, 2023
1b527f8
add alternate values
nvnieuwk Dec 13, 2023
a799d1b
Use SVTYPE instead of ALT for alternate values
nvnieuwk Dec 14, 2023
3e195be
add first steps of breakpoint conversion
nvnieuwk Dec 18, 2023
657f650
remove some faulty code
nvnieuwk Dec 18, 2023
50f2cd2
add breakend to breakpoint conversion
nvnieuwk Jan 5, 2024
72339dc
retain info and format + exclude missing info fields
nvnieuwk Jan 5, 2024
89d8755
add simple breakpoint to breakend conversion
nvnieuwk Jan 8, 2024
edf1627
Add alt changing
nvnieuwk Jan 8, 2024
a740f39
add ci workflows
nvnieuwk Jan 8, 2024
e8a3171
update readme
nvnieuwk Jan 8, 2024
825e215
add sum function
nvnieuwk Jan 8, 2024
6f3fcd7
refactor + add option of index fetching
nvnieuwk Jan 8, 2024
5148789
add svync to gitignore
nvnieuwk Jan 8, 2024
6f6dc28
also change alt in header
nvnieuwk Jan 8, 2024
c24d2d4
contig order remains the same
nvnieuwk Jan 8, 2024
6cc4591
add len function
nvnieuwk Jan 8, 2024
9d0b21c
refactor done, just some small issues to fix
nvnieuwk Jan 16, 2024
5f6a59d
last fixes
nvnieuwk Jan 16, 2024
ed544a0
fix error message
nvnieuwk Jan 16, 2024
d8175ff
add a mute-warnings option
nvnieuwk Jan 16, 2024
7b66e81
remove to breakend code
nvnieuwk Jan 16, 2024
9d6f247
swap mates if necessary
nvnieuwk Jan 16, 2024
2775ce5
remove --to-breakpoint for now
nvnieuwk Jan 19, 2024
8d68e86
small update to readme
nvnieuwk Jan 22, 2024
1a2aacd
add mute warnings to readme
nvnieuwk Jan 22, 2024
efd6b02
fix actions not running
nvnieuwk Jan 22, 2024
bd628f1
Merge pull request #1 from nvnieuwk/refactor
nvnieuwk Jan 22, 2024
d387aa6
bump version to 0.1.0
nvnieuwk Jan 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions .github/workflows/go.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# This workflow will build a golang project
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-go

name: Go

on:
push:
branches: [ "main", "dev" ]
pull_request:
branches: [ "main", "dev" ]

jobs:

build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: '1.21'

- name: Install dependencies
run: go get .

- name: Build
run: go build -v ./...

- name: Test
run: go test -v ./...
27 changes: 27 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
on:
release:
types: [created]

permissions:
contents: write
packages: write

jobs:
release-bedgovcf:
name: release bedgovcf ${{ matrix.goos }}_${{ matrix.goarch }}
runs-on: ubuntu-latest
strategy:
matrix:
goos: [linux, darwin]
goarch: [amd64, arm64]
exclude:
- goos: linux
goarch: arm64

steps:
- uses: actions/checkout@v3
- uses: wangyoucao577/go-release-action@v1
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
goos: ${{ matrix.goos }}
goarch: ${{ matrix.goarch }}
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
*.dll
*.so
*.dylib
svync

# Test binary, built with `go test -c`
*.test
Expand All @@ -19,3 +20,6 @@

# Go workspace file
go.work

# Output files
test.vcf
61 changes: 60 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,61 @@
# svync
⚠️ This tool is still under development, please check back in the future ⚠️
Svync is a tool designed to synchronize structural variant calls from different callers. It uses YAML configs to define how to handle the standardization.

## Usage
```bash
svync --config <config.yaml> --input <input.vcf>
```

### Arguments
#### Required
| Argument | Description |
| --- | --- |
| `--config`/`-c` | Path to the YAML config file |
| `--input`/`-i` | Path to the input VCF file |

#### Optional
| Argument | Description | Default |
| --- | --- | --- |
| `--output`/`-o` | Path to the output VCF file | `stdout` |
| `--nodate`/`--nd` | Do not add the date to the output VCF file | `false` |
| `--mute-warnings`/`--mw` | Do not output warnings | `false` |

## Configuration
The configuration file is the core of the standardization in Svync. More information can be found in the [configuration documentation](docs/configuration.md).


## Installation
### Mamba/Conda
This is the preffered way of installing BedGoVcf.

```bash
mamba install -c bioconda bedgovcf
```

or with conda:

```bash
conda install -c bioconda bedgovcf
```

### Precompiled binaries
Precompiled binaries are available for Linux and macOS on the [releases page](https://github.com/nvnieuwk/svync/releases).


### Installation from source
Make sure you have go installed on your machine (or [install](https://go.dev/doc/install) it if you don't currently have it)

Then run these commands to install bedgovcf:

```bash
go get .
go build .
sudo mv bedgovcf /usr/local/bin/
```

Next run this command to check if it was correctly installed:

```bash
bedgovcf --help
```

29 changes: 29 additions & 0 deletions data/delly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Test config for Delly SV caller
id: "delly_$INFO/SVTYPE"
alt:
BND: TRA
info:
CALLER:
value: delly
description: SV caller
number: 1
type: string
TEST:
value: $INFO/END,$INFO/CIEND/1
description: Test info field
number: 2
type: integer
SVLEN:
value: ~sub:$INFO/END,$POS
description: SV length
number: 2
type: integer
alts:
DEL: -~sub:$INFO/END,$POS
INS: $INFO/INSLEN
format:
PE:
value: $FORMAT/DR,$FORMAT/DV
description: Paired-read support for the ref and alt alleles in the order listed
number: 2
type: integer
36 changes: 36 additions & 0 deletions data/gridss.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Test config for Delly SV caller
id: "gridss_$INFO/SVTYPE"
info:
CALLER:
value: gridss
description: SV caller
number: 1
type: string
CIPOS:
value: $INFO/CIPOS
description: Confidence interval around POS for imprecise variants
number: 2
type: Integer
alts:
BND:
CIEND:
value: $INFO/CIRPOS
description: Confidence interval around END position for imprecise variants
number: 2
type: Integer
SVLEN:
value: $INFO/SVLEN
description: The length of the structural variant
number: 1
type: Integer
IMPRECISE:
value: $INFO/IMPRECISE
description: Imprecise structural variation
number: 0
type: flag
format:
GT:
value: ./.
description: Genotype
number: 1
type: string
50 changes: 50 additions & 0 deletions data/test1.delly.vcf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20231204
##ALT=<ID=DEL,Description="Deletion">
##ALT=<ID=DUP,Description="Duplication">
##ALT=<ID=INV,Description="Inversion">
##ALT=<ID=BND,Description="Translocation">
##ALT=<ID=INS,Description="Insertion">
##FILTER=<ID=LowQual,Description="Poor quality and insufficient number of PEs and SRs.">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="PE confidence interval around END">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="PE confidence interval around POS">
##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for POS2 coordinate in case of an inter-chromosomal translocation">
##INFO=<ID=POS2,Number=1,Type=Integer,Description="Genomic position for CHR2 in case of an inter-chromosomal translocation">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
##INFO=<ID=PE,Number=1,Type=Integer,Description="Paired-end support of the structural variant">
##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
##INFO=<ID=SRMAPQ,Number=1,Type=Integer,Description="Median mapping quality of split-reads">
##INFO=<ID=SR,Number=1,Type=Integer,Description="Split-read support">
##INFO=<ID=SRQ,Number=1,Type=Float,Description="Split-read consensus alignment quality">
##INFO=<ID=SVINSSEQ,Number=1,Type=String,Description="Split-read consensus sequence">
##INFO=<ID=CE,Number=1,Type=Float,Description="Consensus sequence entropy">
##INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS.">
##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
##INFO=<ID=INSLEN,Number=1,Type=Integer,Description="Predicted length of the insertion">
##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Predicted microhomology length using a max. edit distance of 2">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GL,Number=G,Type=Float,Description="Log10-scaled genotype likelihoods for RR,RA,AA genotypes">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=FT,Number=1,Type=String,Description="Per-sample genotype filter">
##FORMAT=<ID=RC,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the SV">
##FORMAT=<ID=RCL,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the left control region">
##FORMAT=<ID=RCR,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the right control region">
##FORMAT=<ID=RDCN,Number=1,Type=Integer,Description="Read-depth based copy-number estimate for autosomal sites">
##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference pairs">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant pairs">
##FORMAT=<ID=RR,Number=1,Type=Integer,Description="# high-quality reference junction reads">
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="# high-quality variant junction reads">
##reference=reference.fasta
##contig=<ID=chr14,length=2000001>
##contig=<ID=chr16,length=2000001>
##contig=<ID=chrX,length=2000001>
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT PosCon1
chr16 86933 DEL00000000 T <DEL> 120 LowQual PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.6;END=1349692;PE=0;MAPQ=0;CT=3to5;CIPOS=-9,9;CIEND=-9,9;SRMAPQ=60;INSLEN=0;HOMLEN=9;SR=2;SRQ=0.986667;SVINSSEQ=AAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAATATATATATATATATATATATATATATATATATATATACACATACATATATACGGTTGATTTTTACATATTGATCTTGTATCTTGTAACCTTGCTGAACTTGTTCATTAGTTCTAAT;CE=1.61868 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-24.3989,-2.10615,0:21:PASS:0:31692:19715:3:0:0:0:7
chr16 1077371 INV00000001 T <INV> 58 LowQual IMPRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1078502;PE=2;MAPQ=29;CT=5to5;CIPOS=-392,392;CIEND=-392,392 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 0/0:0,-0.521621,-107.701:6:LowQual:40:102:49:2:19:2:0:0
chr16 1123476 INV00000002 A <INV> 180 PASS PRECISE;SVTYPE=INV;SVMETHOD=EMBL.DELLYv1.1.6;END=1604486;PE=0;MAPQ=0;CT=5to5;CIPOS=-3,3;CIEND=-3,3;SRMAPQ=60;INSLEN=0;HOMLEN=3;SR=3;SRQ=0.98;SVINSSEQ=GAATTGCTTGAACACTGCACCACTGCACTCCAGCCTGGGTGACAGAGGAAGACTCTTTCTCCAAAAAAAAAGAATGTTTTCCTACATATATATATATATATATATATATATACACACACACACACACACACACACACACACACACAGTCT;CE=1.88447 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-38.9839,-3.59622,0:36:PASS:11456:39951:0:7:0:0:0:12
chr16 1135261 INS00000003 C <INS> 299 PASS PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.6;END=1135262;SVLEN=27;PE=0;MAPQ=0;CT=NtoN;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=60;INSLEN=27;HOMLEN=2;SR=5;SRQ=1;SVINSSEQ=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCACTGGAAACAGCCAAGAGATCCTTCAAAAAGTGAATGGATAAACCAACTGTAACTCATTCATACAGTGGAACGTTAATCAGCAATTCTAAAAATGAGCTATCAAGTCACAAAAAGACAAAGAAGAACCTTAACACAAAATAACA;CE=1.67205 GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV 1/1:-90.4974,-7.52311,0:75:PASS:11619:25087:13468:2:0:0:0:25
Binary file added data/test1.delly.vcf.gz
Binary file not shown.
Binary file added data/test1.delly.vcf.gz.tbi
Binary file not shown.
Binary file added data/test2.gridss.vcf.gz
Binary file not shown.
Binary file added data/test2.gridss.vcf.gz.tbi
Binary file not shown.
134 changes: 134 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Configuration
The configuration file consists of 4 main parts:
1. `id`
2. `alt`
3. `info`
4. `format`

## `id`
The `id` section is used to define the ID of the variant. The `id` section can be defined as follows:
```yaml
id: <id>
```
The value for the ID can be resolved (see [Resolvable fields](#resolvable-fields)). All IDs get a unique number appended to them to ensure that they are unique.

## `alt`
The `alt` section can be used to change the ALT field and SVTYPE info field for each variant. The `alt` section can be defined as follows:
```yaml
alt:
<alt>: <new_alt>
```

For example you might want to change the `BND` ALT to `TRA` (for Delly for example):
```yaml
alt:
BND: TRA
```

## `info`
The `info` section can be used to change the info fields for each variant. The `info` section can be defined as follows:
```yaml
info:
<info_field>:
value: <new_value>
type: <new_type>
description: <new_description>
number: <new_number>
alts:
<alt>: <new_value>
<alt>: <new_value>
```
### value
The `value` field can be used to change the default value of the info field. The value can be resolved (see [Resolvable fields](#resolvable-fields)).

### type
The `type` field can be used to set the type of the info field (This will be reflected in the header of the output VCF file).

### description
The `description` field can be used to set the description of the info field (This will be reflected in the header of the output VCF file).

### number
The `number` field can be used to set the number of the info field (This will be reflected in the header of the output VCF file).

### alts
The `alts` field can be used to set the value of the info field for a specific ALT. The value can be resolved (see [Resolvable fields](#resolvable-fields)).

For example when all `SVLEN` info fields are positive, you maybe want to change the field for all deletions to the negative length:
```yaml
info:
SVLEN:
value: $INFO/SVLEN
type: Integer
description: "Structural variant length"
number: 1
alts:
DEL: -$INFO/SVLEN
```

## `format`
The `format` section can be used to change the format fields for each variant. The `format` section can be defined as follows:
```yaml
format:
<format_field>:
value: <new_value>
type: <new_type>
description: <new_description>
number: <new_number>
alts:
<alt>: <new_value>
<alt>: <new_value>
```

The format fields work the same as the info fields (see [Info](#info)).

## Resolvable fields

Some fields can be resolved to a value.

### Variables

A variable can be resolved appending a `$` to the field name.

Following variables are available:
1. `$FORMAT/<format_field>` => This is only accesible for other format fields
- An additional `/<number>` can be added to get a specific value in case of multiple values
2. `$INFO/<info_field>`
- An additional `/<number>` can be added to get a specific value in case of multiple values
3. `$POS`
4. `$CHROM`
5. `$ALT`
6. `$QUAL`
7. `$FILTER`

For example `$INFO/SVLEN` will be resolved to the value of the `SVLEN` info field.

### Functions

Functions are very simple calculations that can be done on the values.

More functions can be added in the future. Please open an issue to request new functions.

#### `~sub`
The `~sub` function can be used to substract values from each other. The function can be used as follows:

```yaml
~sub:<value_start>,<value_to_substract>,<value_to_substract>,...
```

:warning: only integers and floats are supported for this function :warning:

#### `~sum`
The `~sum` function can be used to take the sum of all values. The function can be used as follows:

```yaml
~sum:<value_start>,<value_to_add>,<value_to_add>,...
```

:warning: only integers and floats are supported for this function :warning:

#### `~len`
The `~len` function can be used to get the length of a string value. The function can be used as follows:

```yaml
~len:<value>
```
Loading