Skip to content

Conversation

arsenkylyshbek
Copy link
Collaborator

Speaker Diarization with ECAPA Embeddings

adds proper speaker identification to conversations using voice features

What it does

  • extracts ECAPA-TDNN embeddings (192-dim, super accurate) from each speech segment
  • uses WebRTC VAD to split audio at natural pauses for better accuracy
  • automatically figures out how many speakers using BIC (Bayesian information criterion) clustering
  • runs in post-processing so it doesn't slow down real-time transcription

How it works

  1. conversation gets transcribed normally
  2. post-processing kicks in → runs VAD to find precise speech regions
  3. extracts voice embeddings for each segment
  4. clusters them to identify unique speakers
  5. merges segments from same speaker

Files changed

  • backend/utils/stt/speaker_diarization.py - new module with all the diarization logic
  • backend/utils/conversations/postprocess_conversation.py - integration into existing pipeline

tested on 2-5 speaker conversations, works way better than the old approach


Next Steps

Accuracy improvements:

  • Better clustering - try HDBSCAN or spectral clustering instead of k-means for cleaner speaker separation
  • Dynamic K selection - add adaptive thresholds based on audio length and embedding variance
  • Embedding refinement - fine-tune ECAPA on omi conversation data or add a secondary pass with longer context windows
  • Temporal smoothing - prevent rapid speaker switches using HMM or temporal consistency constraints

@aaravgarg
Copy link
Collaborator

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

@aaravgarg
Copy link
Collaborator

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

@arsenkylyshbek bumping up

@arsenkylyshbek
Copy link
Collaborator Author

@arsenkylyshbek whats our diarization error rate currently and what does it drop to through your solution, curious

@arsenkylyshbek bumping up

@aaravgarg i don't have the numbers atm. will benchmark and let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants