Rust port of the Silero Voice Activity Detector that runs the pre-trained ONNX models through the safe ort bindings. The crate bundles the original ONNX weights and exposes idiomatic Rust helpers for loading audio, running the network, and post-processing speech segments.
- π Universal β Silero VAD was trained on massive corpora covering 6,000+ languages, so it performs well across domains and under noisy conditions.
- π§ Pre-trained accuracy β ships with Silero ONNX models (
opset 15&16) insidesrc/silero_vad/data, so no extra downloads are required. - π₯οΈ CPU friendly β defaults to ONNX Runtimeβs CPU execution provider for predictable server and edge runs, while keeping other providers available if you reconfigure
LoadOptions. - π Streaming ready β
forward_chunkkeeps internal state for long-running streams, whileaudio_forwardprocesses entire buffers offline. - ποΈ Configurable thresholds β
VadParameterslets you tune thresholds, speech/silence windows, and return units (samples or seconds) for rapid adaptation to new setups. - π§ Helper APIs β ships with
read_audio,save_audio,get_speech_timestamps,collect_chunks,drop_chunks, andVadIteratorfor quick end-to-end adoption.
cargo add silero-vad-rustThe crate expects the ONNX Runtime shared library to be discoverable at runtime (ort with load-dynamic). See the ort crate docs if you need to pin a specific runtime build.
Strongly recommend ONNX Runtime
1.22.x(older releases are not supported; newer releases may be incompatible).
Official prebuilt ONNX Runtime binaries (Windows):
- onnxruntime-win-x64-1.22.1.zip - 70.3 MB
SHA256:855276cd4be3cda14fe636c69eb038d75bf5bcd552bda1193a5d79c51f436dfe - onnxruntime-win-x64-gpu-1.22.1.zip - 300 MB
SHA256:4e6eeb8bfe4137cf98ccdc0b01f0400928bc3f8261b90e8e0d1c28410a33cac4 - onnxruntime-win-x86-1.22.1.zip - 69.2 MB
SHA256:da9fe96efacb0d1b9f4ae0b8f69f9694b6b3f673fc2a697c6b932679610ed292 - onnxruntime-win-arm64-1.22.1.zip - 70.1 MB
SHA256:3c984f25de07fdbbd2be36792dabfa18810c7483262238ea241ca5a1e52a4f82
After downloading and extracting one of the archives, place onnxruntime.dll next to your executable (or ensure it is on your library path) so the dynamic loader can find it.
use silero_vad_rust::{get_speech_timestamps, load_silero_vad, read_audio};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn main() -> anyhow::Result<()> {
let audio = read_audio("samples/test.wav", 16_000)?;
let mut model = load_silero_vad()?; // defaults to ONNX opset 16
let params = VadParameters {
return_seconds: true,
..Default::default()
};
let speech = get_speech_timestamps(&audio, &mut model, ¶ms)?;
println!("Detected segments: {speech:?}");
Ok(())
}use silero_vad_rust::{load_silero_vad, read_audio};
fn stream_chunks() -> anyhow::Result<()> {
let audio = read_audio("samples/long.wav", 16_000)?;
let mut model = load_silero_vad()?;
let chunk_size = 512; // 16 kHz window
for frame in audio.chunks(chunk_size) {
let padded = if frame.len() == chunk_size {
frame.to_vec()
} else {
let mut tmp = vec![0.0f32; chunk_size];
tmp[..frame.len()].copy_from_slice(frame);
tmp
};
let probs = model.forward_chunk(&padded, 16_000)?;
println!("frame prob={:.3}", probs[[0, 0]]);
}
Ok(())
}use silero_vad_rust::{
collect_chunks, drop_chunks, get_speech_timestamps, load_silero_vad, read_audio, save_audio,
};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn trim_audio() -> anyhow::Result<()> {
let audio = read_audio("samples/raw.wav", 16_000)?;
let mut model = load_silero_vad()?;
let params = VadParameters {
return_seconds: false,
..Default::default()
};
let speech = get_speech_timestamps(&audio, &mut model, ¶ms)?;
let voice_only = collect_chunks(&speech, &audio, false, None)?;
save_audio("out_voice.wav", &voice_only, 16_000)?;
let muted_voice = drop_chunks(&speech, &audio, false, None)?;
save_audio("out_silence.wav", &muted_voice, 16_000)?;
Ok(())
}use silero_vad_rust::{
load_silero_vad, read_audio,
silero_vad::utils_vad::{VadEvent, VadIterator, VadIteratorParams},
};
fn iterate_events() -> anyhow::Result<()> {
let audio = read_audio("samples/live.wav", 16_000)?;
let model = load_silero_vad()?;
let params = VadIteratorParams {
threshold: 0.55,
..Default::default()
};
let mut iterator = VadIterator::new(model, params)?;
for frame in audio.chunks(512) {
let event = iterator.process_chunk(frame, true, 1)?;
if let Some(VadEvent::Start(ts)) = event {
println!("speech started at {ts}s");
} else if let Some(VadEvent::End(ts)) = event {
println!("speech ended at {ts}s");
}
}
Ok(())
}use silero_vad_rust::silero_vad::model::{load_silero_vad_with_options, LoadOptions};
fn load_gpu_model() -> anyhow::Result<()> {
let options = LoadOptions {
opset_version: 15,
force_onnx_cpu: false, // allow custom providers (GPU, NNAPI, etc.)
..Default::default()
};
let _model = load_silero_vad_with_options(options)?;
Ok(())
}To actually run on GPU, enable the matching
ortfeature in your ownCargo.toml(for exampleort = { version = "2.0.0-rc.10", features = ["load-dynamic", "cuda"] }) and pointORT_DYLIB_PATH(or your system library path) to a GPU-enabled ONNX Runtime binary. Withforce_onnx_cpu = false, the runtime will use whatever providers were compiled into that library; if you need to prioritize a specific provider (e.g. CUDAExecutionProvider), extendOnnxModel::from_pathto register it explicitly.
use silero_vad_rust::{get_speech_timestamps, load_silero_vad, read_audio};
use silero_vad_rust::silero_vad::utils_vad::VadParameters;
fn compare_thresholds() -> anyhow::Result<()> {
let audio = read_audio("samples/noisy.wav", 16_000)?;
let mut model = load_silero_vad()?;
let mut strict = VadParameters::default();
strict.threshold = 0.65;
strict.min_speech_duration_ms = 400;
let mut permissive = VadParameters::default();
permissive.threshold = 0.4;
permissive.min_speech_duration_ms = 150;
let strict_segments = get_speech_timestamps(&audio, &mut model, &strict)?;
model.reset_states();
let permissive_segments = get_speech_timestamps(&audio, &mut model, &permissive)?;
println!("strict count: {}", strict_segments.len());
println!("permissive count: {}", permissive_segments.len());
Ok(())
}cargo fmt
cargo clippy --all-targets
cargo testIntegration tests use WAV fixtures in tests/data.
load_silero_vad_with_optionslets you pickopset_version15/16 and toggleforce_onnx_cpu.- TorchScript weights are intentionally unsupported in this port; always pass
use_onnx = true. - When building without audio I/O, disable the default feature:
cargo build --no-default-features.
- Voice activity detection for IoT / edge / mobile deployments
- Offline data cleaning and generic voice-detection pipelines
- Telephony, call-center automation, and voice bots
- Voice interfaces and conversational UX layers
Distributed under the MIT License, matching the upstream Silero VAD project. See LICENSE for details.
