-
-
Notifications
You must be signed in to change notification settings - Fork 229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some SRT seqments are way to big - Silero VAD problem #470
Comments
I found an old subtitle file that I had generated, i think it was large v3 back on 2024-11-25 This is what it looked like back then:
and thats today
Same File |
It seems like that this is a According to #396 (comment), whisperx's VAD implementation gives better results, so I'm planning to add it later. Any PR or suggestions about the VAD would be welcome. |
Allright, but i'm using |
Yeah, regardless of the implementations (the main difference between implementations is speed and VRAM efficiency, the result should be the same if you use the same model), it should be the same. I hope using whisperX's VAD implementation will improve this kind of problem. And if there is any hallucination, I would recommend using |
Ok, but I just tried
|
Hmm. Probably different use of parameter caused that. Like |
No, that is not the case. Tried some beam size settings: Of course, my Docker updates were not at the same time as the updates in the repository, but I can say that the problem became visible to me around 17th of January 25. Since the Docker container has no versioning, I cannot identify an exact version. |
I meant just an example. Since If this suddenly happened when using |
Must have happend between 01/01/2025 and 17/01/2025 |
I actually just did an repository checkout from ad418ca and built the Docker image myself. So there must be kind of an external dependency that causes the problem. Maybe it would make sense to switch to the ghcr and use tags / image versioning. |
Yeah that's it, Shall I PR that @jhj0517 ? |
@ei23fxg Yeah I'd appreciate it if you do that. |
Fix for issue with insanely-fast-whisper see jhj0517#470
Fixed with #484 |
fixed transformers version for issue #470
Which OS are you using?
When I create a timestamp file, I sometimes get very large sections that are more or less useless to use for subtitles.
I'm not entirely sure if this has always been the case or if it came with an update. I suspect it has something to do with the SILERO VAD filter.
If the SILERO VAD filter is disabled, the problem definitely occurs frequently.
I have already tested some settings on the SILERO VAD filter. This changes the results to some extent. However, I have not been able to achieve the desired result.
I'm sure, im doing something wrong, but others will surely run into this issue. It would be useful to mention some best practise about this in the documentation.
large-v3 has the same issue.
Below is an SRT example for illustrational purpose.
The text was updated successfully, but these errors were encountered: