-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte #1279
Comments
Do you use original gigaspeech model or did you modify it? I can't see a way original model to return non-utf8 char. |
Original, unmodified. |
We need to reproduce it somehow. The 0xa0 output is very strange to be honest, feels more like a memory corruption. How often do you see this issue? |
I've only seen it once. If it happens again I'll let you know. |
Ok, lets keep it open, I'll think how to catch it better. |
There is a possibility that this was triggered because the Vosk object was reset ( This is only speculation, but I wanted to point it out in case it's a problem being caused external to your API library. In terms of troubleshooting, are there any |
So far I've not been able to reproduce this problem, but while using nerd-dictation, we have hit a Vosk decoding issue that appears to be rooted in the Bosk Python API code. I am running Python version 3.6 on CentOS 7 (which gets updates form Red Hat until 2024) while using the
vosk-model-en-us-0.42-gigaspeech
model.You can see the backtrace below. Notice that the last line triggers an error within the Vosk API at "vosk/init.py", line 194, in FinalResult
@ideasman42, the developer of nerd-dictation suggests that this could be fixed in Vosk by adding
errors=ignore
. For example:There are 4 different locations where text is decoded to UTF-8, so perhaps they need fixed up as well:
The text was updated successfully, but these errors were encountered: