Skip to content

pieetie/svforge

Repository files navigation

svForge logo

Generate synthetic SV VCFs to stress-test your pipelines with confidence

PyPI version License DOI


svForge produces caller-shaped VCFs (Manta, DELLY) in VCF / VCF.gz / BCF format with fine-grained control over variability (HOMLEN, SVLEN, VAF) and realistic artefact injection (SVs in ENCODE blacklist regions, gnomAD germline SVs).

Designed to be modular, it is easy to adapt to your own use case. You can tune generation parameters, plug in new callers, and customize the workflow without reworking the whole tool.

Installation

pip install svforge

Or from source:

git clone https://github.com/pieetie/svforge
cd svforge
pip install -e ".[dev,test]"

Quick start

For ready-to-run command lines (single-sample gen, paired somatic gen-pair, validation, banks, and dev checks), see docs/ready-to-use.md.

Typical use cases

  • Validate downstream filters (for example, SVFORGE_SOURCE=gnomad records should disappear after your gnomAD filtering step).
  • Validate ENCODE blacklist annotation logic (for example, SVFORGE_SOURCE=blacklist records should receive your expected poor-mappability label).
  • Run reproducible CI regression tests with fixed seeds, without committing generated VCF files.
  • Smoke-test deployments and pipeline changes in seconds instead of rerunning full variant callers on BAM files.
  • Reproduce specific scenarios and edge cases (cross-chrom BNDs, contig-edge events, controlled VAF/HOMLEN ranges) for debugging and QA.
  • Demo or onboard safely with realistic SV VCFs and no patient data.

CLI

svforge gen          # one VCF for one sample
svforge gen-pair     # one 2-sample somatic VCF (NORMAL + TUMOR)
svforge validate     # self-consistency check of injected SVs
svforge bank list    # list built-in banks
svforge bank show    # dump a bank as YAML
svforge callers      # list registered writers

Run svforge <cmd> --help for the full flag list.

Credits

  • Logo and visual identity - Elisa Perrin
  • Claude (Anthropic) - assisted with tests, documentation, refactoring, and release tooling (Ruff linting/formatting, CI cleanup)

License

svForge icon Distributed under the MIT License.

About

🛠️ Reproducible, traceable synthetic SV VCFs for testing what your pipeline does to them. Caller-shaped Manta and DELLY outputs, real gnomAD/ENCODE injection with self-verifying tags. Synthetic data only.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages