feat: speaker diarization #3100

arsenkylyshbek · 2025-10-02T23:27:49Z

Speaker Diarization with ECAPA Embeddings

adds proper speaker identification to conversations using voice features

What it does

extracts ECAPA-TDNN embeddings (192-dim, super accurate) from each speech segment
uses WebRTC VAD to split audio at natural pauses for better accuracy
automatically figures out how many speakers using BIC (Bayesian information criterion) clustering
runs in post-processing so it doesn't slow down real-time transcription

How it works

conversation gets transcribed normally
post-processing kicks in → runs VAD to find precise speech regions
extracts voice embeddings for each segment
clusters them to identify unique speakers
merges segments from same speaker

Files changed

backend/utils/stt/speaker_diarization.py - new module with all the diarization logic
backend/utils/conversations/postprocess_conversation.py - integration into existing pipeline

tested on 2-5 speaker conversations, works way better than the old approach

Next Steps

Accuracy improvements:

Better clustering - try HDBSCAN or spectral clustering instead of k-means for cleaner speaker separation
Dynamic K selection - add adaptive thresholds based on audio length and embedding variance
Embedding refinement - fine-tune ECAPA on omi conversation data or add a secondary pass with longer context windows
Temporal smoothing - prevent rapid speaker switches using HMM or temporal consistency constraints

aaravgarg · 2025-10-03T03:39:21Z

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

aaravgarg · 2025-10-03T17:22:39Z

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

@arsenkylyshbek bumping up

arsenkylyshbek · 2025-10-03T17:25:29Z

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

@arsenkylyshbek bumping up

@aaravgarg i don't have the numbers atm. will benchmark and let you know

arsenkylyshbek added 3 commits October 3, 2025 04:01

add speaker diarization with ecapa embeddings and bic clustering

a408082

add missing diarization file

ce2a94e

add tests and docs

7021073

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: speaker diarization #3100

feat: speaker diarization #3100

arsenkylyshbek commented Oct 2, 2025

Uh oh!

aaravgarg commented Oct 3, 2025

Uh oh!

aaravgarg commented Oct 3, 2025

Uh oh!

arsenkylyshbek commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: speaker diarization #3100

Are you sure you want to change the base?

feat: speaker diarization #3100

Conversation

arsenkylyshbek commented Oct 2, 2025

Speaker Diarization with ECAPA Embeddings

What it does

How it works

Files changed

Next Steps

Uh oh!

aaravgarg commented Oct 3, 2025

Uh oh!

aaravgarg commented Oct 3, 2025

Uh oh!

arsenkylyshbek commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants