Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
0db0e82
init gitignore
pieetie Apr 23, 2026
60a9d1b
update gitignore
pieetie Apr 23, 2026
6e4b631
chore: svforge metadata and packaging
pieetie Apr 23, 2026
567b1fc
feat: import core SV sampling engine
pieetie Apr 23, 2026
bd56acb
feat: import VCF writers and caller plugins
pieetie Apr 23, 2026
34b96e8
feat: import validation and overlap tooling
pieetie Apr 23, 2026
009b159
data: add banks and header templates
pieetie Apr 23, 2026
7f7a69b
feat: CLI and console scripte
pieetie Apr 23, 2026
05ac4a8
feat: add I/O layer for VCF, gzip, and BCF
pieetie Apr 23, 2026
4cbfd60
feat: add gnomAD and blacklist injection catalogs
pieetie Apr 24, 2026
7acf1d1
feat: sample SV injections from bundled catalogs
pieetie Apr 24, 2026
f1b1914
feat: add CLI flags for gnomAD and blacklist injection
pieetie Apr 24, 2026
26384b3
chore: align Manta/Delly writers and header templates
pieetie Apr 24, 2026
6bb4cb0
refactor: streamline annotate and remove unused validate modules
pieetie Apr 24, 2026
f8780b2
test: add tests
pieetie Apr 24, 2026
6f71bd8
fix: ruff fixes
pieetie Apr 24, 2026
7529879
update gitignore
pieetie Apr 24, 2026
70da192
chore: add data_local README and gen-test for dev setup
pieetie Apr 24, 2026
6e30e47
docs: add brand assets and usage notice
pieetie Apr 24, 2026
25644fe
docs: add ready-to-use command sheet
pieetie Apr 24, 2026
74963f9
docs: update README
pieetie Apr 25, 2026
3f7d21b
docs: fix logo color toggle
pieetie Apr 25, 2026
a723c15
docs: fix logo width
pieetie Apr 25, 2026
3c202a7
docs: fix logo width (450)
pieetie Apr 25, 2026
4d95fa5
docs: add contributing
pieetie Apr 25, 2026
54a40de
docs: fix contributing
pieetie Apr 25, 2026
8f936eb
docs: add CHANGELOG
pieetie Apr 25, 2026
6b165ab
ci: update CI and publish workflows
pieetie Apr 25, 2026
088e861
ci: harden release process
pieetie Apr 25, 2026
92906e1
docs: fix README license icon
pieetie Apr 25, 2026
2fdc2bf
docs: document bcftools sort+index workaround
pieetie Apr 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: CI

on:
push:
branches: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
quality:
name: Lint, type-check, test
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4

- name: Install bcftools
run: |
sudo apt-get update
sudo apt-get install -y bcftools

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip

- name: Install package
run: |
python -m pip install -U pip
python -m pip install -e ".[dev,test]"

- name: Ruff lint
run: ruff check .

- name: Mypy
run: mypy src

- name: Pytest
run: pytest

- name: bcftools round-trip check
shell: bash
run: |
set -euo pipefail
tmp=$(mktemp -d)
svforge gen --caller manta --out "$tmp/manta.vcf.gz" --n 50 --sample-name T01 --seed 42
svforge gen --caller delly --out "$tmp/delly.bcf" --n 50 --sample-name T01 --seed 42
bcftools view -h "$tmp/manta.vcf.gz" > /dev/null
bcftools view -h "$tmp/delly.bcf" > /dev/null
test "$(bcftools view -H "$tmp/manta.vcf.gz" | wc -l)" -gt 0
test "$(bcftools view -H "$tmp/delly.bcf" | wc -l)" -gt 0
123 changes: 123 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
name: Publish

on:
push:
tags:
- "v*"

jobs:
verify-tag:
name: Verify tag matches pyproject version
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Check tag == project version
shell: bash
run: |
set -euo pipefail
tag="${GITHUB_REF_NAME}"
tag_version="${tag#v}"
project_version=$(python -c "import tomllib, pathlib; print(tomllib.loads(pathlib.Path('pyproject.toml').read_text())['project']['version'])")
echo "Tag: $tag (parsed: $tag_version)"
echo "Project: $project_version"
if [[ "$tag_version" != "$project_version" ]]; then
echo "::error::Tag ($tag_version) does not match pyproject.toml version ($project_version)"
exit 1
fi

quality:
name: Quality gate before publish
needs: verify-tag
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install bcftools
run: |
sudo apt-get update
sudo apt-get install -y bcftools

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: pip

- name: Install package
run: |
python -m pip install -U pip
python -m pip install -e ".[dev,test]"

- name: Ruff lint
run: ruff check .

- name: Mypy
run: mypy src

- name: Pytest
run: pytest

- name: bcftools round-trip check
shell: bash
run: |
set -euo pipefail
tmp=$(mktemp -d)
svforge gen --caller manta --out "$tmp/manta.vcf.gz" --n 50 --sample-name T01 --seed 42
svforge gen --caller delly --out "$tmp/delly.bcf" --n 50 --sample-name T01 --seed 42
bcftools view -h "$tmp/manta.vcf.gz" > /dev/null
bcftools view -h "$tmp/delly.bcf" > /dev/null
test "$(bcftools view -H "$tmp/manta.vcf.gz" | wc -l)" -gt 0
test "$(bcftools view -H "$tmp/delly.bcf" | wc -l)" -gt 0

build:
name: Build sdist and wheel
needs: quality
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install build and twine
run: python -m pip install -U pip build twine

- name: Build
run: python -m build

- name: Twine check
run: twine check dist/*

- name: Upload artefacts
uses: actions/upload-artifact@v4
with:
name: dist
path: dist/

publish:
name: Publish to PyPI
needs: build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/project/svforge
permissions:
id-token: write

steps:
- name: Download artefacts
uses: actions/download-artifact@v4
with:
name: dist
path: dist/

- name: Publish
uses: pypa/gh-action-pypi-publish@release/v1
54 changes: 54 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
data_local/*
!data_local/README.md
!data_local/gen-test/
data_local/gen-test/*
!data_local/gen-test/.gitkeep

__pycache__/
*.py[cod]
*$py.class
*.so
.Python
*.egg
*.egg-info/
.eggs/
dist/
build/
*.manifest
*.spec
pip-wheel-metadata/
share/python-wheels/
wheels/

.venv/
venv/
ENV/
env/
.env.venv

.env
.env.local
.env.*.local
*.env

.pytest_cache/
.coverage
.coverage.*
htmlcov/
.tox/
.nox/
coverage.xml
*.cover
.hypothesis/
.mypy_cache/
.ruff_cache/
.dmypy.json
dmypy.json

.ipynb_checkpoints/

.idea/
.vscode/
.DS_Store
Thumbs.db
*.log
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Changelog

## [1.0.0] — 2026-04-25

Initial release.

- `svforge gen` and `svforge gen-pair` to generate synthetic VCFs (single-sample or coherent tumor/normal pair)
- Manta (VCFv4.1) and DELLY (VCFv4.2) writers
- Injection via `--gnomad-fraction` and `--blacklist-fraction`
- `INFO/SVFORGE_SOURCE` tag on every record (`bank` / `gnomad` / `blacklist`) for self-verifying pipelines
- `svforge validate` to confirm injected records match the bundled catalogs exactly
- Chromosome filtering via `--chromosomes`
- Custom header support via `--header-template PATH`
- Deterministic sampling via `--seed`; effective seed always logged in the output VCF
- BCF / VCF.gz / VCF output formats
- Default hg38 SV bank with weighted templates across DEL, DUP, INV, INS, BND
- Writer plugin system: third-party callers can register via `svforge.writers` entry point
- Non-configurable `##svforgeWarning=SYNTHETIC_DATA_DO_NOT_USE_FOR_CLINICAL_DIAGNOSIS` injected in every VCF header
- `sanitize_command()` strips absolute paths from the logged command line so user home directories and cluster paths never leak into generated VCFs
- Python 3.10+, hg38 only
37 changes: 37 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Contributing to svForge

## Development setup

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,test]"
```

## Contribution workflow

1. Open an issue first and describe the change (new caller, new template, fix, etc.). Wait for feedback before coding.
2. Create a dedicated branch from `main` named `{issue-number}-short-description`. Never work directly on `main`.
3. Use commit messages that reference the issue number: `feat: XXX #12`, `fix: XXX (#15)`.
4. Update `CHANGELOG.md`.
5. Run the local quality gate before pushing. If it is not all green, the PR is not reviewed:

```bash
ruff check .
mypy src
pytest -q
```

6. Push and open a PR to `main`. In the PR description, add `Closes #<issue-number>` for automatic issue closing at merge.
7. Use squash merge. The squash commit message must be clean and final (not a draft branch history dump).
8. Delete the branch after merge.

## Common contribution cases

- Add a new caller (GRIDSS, SvABA, Lumpy, Smoove, ...). This is the main contribution path and a core modularity goal of the project.
- Add a new header template for an uncovered genome or scenario (for example `hg19`, germline mode, tumor-only mode).
- Enrich the bundled mini-catalogs (`gnomad_hg38_mini.tsv`, `blacklist_hg38_mini.tsv`) with more entries or more diversity.
- Improve the default bank (`default_hg38.yaml`) with more representative SV templates.
- Report a bug with a minimal command that reproduces it.
- Improve documentation.
- Add tests for an uncovered edge case (for example negative `SVLEN`, BND to alt contig, etc.).
69 changes: 69 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
<h1>
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/vector-svg/logos-no-background/logo-red-and-white.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/vector-svg/logos-no-background/logo-red-and-black.svg">
<img src="docs/assets/vector-svg/logos-no-background/logo-red-and-black.svg" alt="svForge logo" width="450">
</picture>
</h1>

### Generate synthetic SV VCFs to stress-test your pipelines with confidence

---

**svForge** produces caller-specific VCFs (Manta, DELLY) in VCF / VCF.gz / BCF format with fine-grained control over variability (HOMLEN, SVLEN, VAF) and realistic artefact injection (SVs in ENCODE blacklist regions, gnomAD germline SVs).

Designed to be modular, it is easy to adapt to your own use case.
You can tune generation parameters, plug in new callers, and customize the workflow without reworking the whole tool.

## Installation

```bash
pip install svforge
```

Or from source:

```bash
git clone https://github.com/pieetie/svforge
cd svforge
pip install -e ".[dev,test]"
```

## Quick start

For ready-to-run command lines (single sample, tumor/normal pair, validation, banks, and dev checks), see [`docs/ready-to-use.md`](docs/ready-to-use.md).

## Typical use cases

- Validate downstream filters (for example, `SVFORGE_SOURCE=gnomad` records should disappear after your gnomAD filtering step).
- Validate ENCODE blacklist annotation logic (for example, `SVFORGE_SOURCE=blacklist` records should receive your expected poor-mappability label).
- Run reproducible CI regression tests with fixed seeds, without committing generated VCF files.
- Smoke-test deployments and pipeline changes in seconds instead of rerunning full variant callers on BAM files.
- Reproduce specific scenarios and edge cases (cross-chrom BNDs, contig-edge events, controlled VAF/HOMLEN ranges) for debugging and QA.
- Demo or onboard safely with realistic SV VCFs and no patient data.

## CLI

```
svforge gen # one VCF for one sample
svforge gen-pair # tumor + normal VCFs for somatic pipelines
svforge validate # self-consistency check of injected SVs
svforge bank list # list built-in banks
svforge bank show # dump a bank as YAML
svforge callers # list registered writers
```

Run `svforge <cmd> --help` for the full flag list.

## Credits

- **Logo and visual identity** - [Elisa Perrin](https://www.linkedin.com/in/elisaperrin/)
- **Claude** (Anthropic) - assisted with tests, documentation, refactoring, and release tooling (Ruff linting/formatting, CI cleanup)

## License

<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/assets/vector-svg/icons-on-background/icon-nb-app.svg">
<source media="(prefers-color-scheme: light)" srcset="docs/assets/vector-svg/icons-on-background/icon-bn-app.svg">
<img src="docs/assets/vector-svg/icons-on-background/icon-bn-app.svg" alt="svForge icon" width="14" style="vertical-align: -1px;">
</picture> Distributed under the <a href="./LICENSE">MIT License</a>.
12 changes: 12 additions & 0 deletions data_local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Run from the svforge project root

mkdir -p data_local/gnomad data_local/blacklist

## gnomAD SV v4.1
wget -P data_local/gnomad/ https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/genome_sv/gnomad.v4.1.sv.sites.vcf.gz

wget -P data_local/gnomad/ https://storage.googleapis.com/gcp-public-data--gnomad/release/4.1/genome_sv/gnomad.v4.1.sv.sites.vcf.gz.tbi

## ENCODE blacklist v2
wget -P data_local/blacklist/ https://raw.githubusercontent.com/Boyle-Lab/Blacklist/master/lists/hg38-blacklist.v2.bed.gz

Empty file added data_local/gen-test/.gitkeep
Empty file.
9 changes: 9 additions & 0 deletions docs/assets/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Logos and visual assets

The graphics in this directory (logos, icons, related images) were made by Elisa Perrin for svForge. © Elisa Perrin, used with permission. They're **not** covered by the code licence in `LICENSE`.

Use them freely for anything non-commercial that supports the project: papers, talks, blog posts, slides, docs, community stuff. A credit ("Logo by Elisa Perrin") is appreciated when it fits.

Ask first for: merch, paid ads, rebranding, or anything where you're making money off our name or visuals.

*Intent, not legal advice.*
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading