Skip to content

Conversation

mdestagnol
Copy link
Contributor

@mdestagnol mdestagnol commented Sep 17, 2025

Currently whisper_full_with_state doesn't work with VAD.

The main inference method is whisper_full_with_state. This method takes a whisper_context and a whisper_state. On top of this method are built 2 convenience methods, whisper_full and whisper_full_parallel.

The VAD logic is currently only implemented in those convenience methods, but not in whisper_full_with_state itself. This is an issue whenever we need to deal with more than one state (when processing multiple real time streams for example).

This pull request moves the logic into whisper_full_with_state and fix a few incorrect calls to ctx->state instead of state directly.

@danbev
Copy link
Member

danbev commented Sep 21, 2025

@mdestagnol Thanks for the pull request.

I've been thinking about this and if I recall correctly, the original pr for VAD actually did something like this. I recall getting feedback about the state should be passed in and not be updated inside of this function (I've not been able to find the comment though I'm afraid).

So I think exposing whisper_vad might be a better option here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants