Ignore unicode error within Vosk #91

KJ7LNW · 2023-02-21T21:27:35Z

During dictation, Vosk returned the error below. The cause is unclear, and I cannot reproduce it, but it is a simple solution to ignore this type of decoding error with a warning.

Traceback (most recent call last):
  File "./nerd-dictation", line 1962, in <module>
    main()
  File "./nerd-dictation", line 1958, in main
    args.func(args)
  File "./nerd-dictation", line 1845, in <lambda>
    vosk_grammar_file=args.vosk_grammar_file,
  File "./nerd-dictation", line 1440, in main_begin
    vosk_grammar_file=vosk_grammar_file,
  File "./nerd-dictation", line 1215, in text_from_vosk_pipe
    json_text = rec_handle_fn_wrapper_from_final_result()
  File "./nerd-dictation", line 1054, in rec_handle_fn_wrapper_from_final_result
    json_text = rec.FinalResult()
  File "/usr/src/nerd-dictation/lib64/python3.6/site-packages/vosk/__init__.py", line 194, in FinalResult
    return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

During dictation, Vosk returned the error below. The cause is unclear, and I cannot reproduce it, but it is a simple solution to ignore this type of decoding error with a warning. Traceback (most recent call last): File "./nerd-dictation", line 1962, in <module> main() File "./nerd-dictation", line 1958, in main args.func(args) File "./nerd-dictation", line 1845, in <lambda> vosk_grammar_file=args.vosk_grammar_file, File "./nerd-dictation", line 1440, in main_begin vosk_grammar_file=vosk_grammar_file, File "./nerd-dictation", line 1215, in text_from_vosk_pipe json_text = rec_handle_fn_wrapper_from_final_result() File "./nerd-dictation", line 1054, in rec_handle_fn_wrapper_from_final_result json_text = rec.FinalResult() File "/usr/src/nerd-dictation/lib64/python3.6/site-packages/vosk/__init__.py", line 194, in FinalResult return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode("utf-8") UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte Signed-off-by: Eric Wheeler <[email protected]>

KJ7LNW · 2023-02-22T23:55:26Z

force-pushed change to print the exception as format(e) instead of str(e), which I think is more correct...

ideasman42 · 2023-02-23T00:07:36Z

As a workaround this may be OK, coldn't this be handled on VOSK's side: as every user of the VOSK API should really not have to workaround unicode-decoding errors.

Errors could be ignored e.g.

>>> b'A\xaeB'.decode('utf-8', errors='ignore')
'AB'

KJ7LNW · 2023-02-23T20:31:07Z

You have a good point that this should be addressed in their API, not sure why I did not think of that first. I opened an issue in their repository:

python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte alphacep/vosk-api#1279

For now, would you like to accept this pull request as a workaround?

KJ7LNW force-pushed the ignore-vosk-unicode-error branch from a9f5628 to 50426d2 Compare February 22, 2023 23:55

KJ7LNW mentioned this pull request Feb 23, 2023

python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte alphacep/vosk-api#1279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignore unicode error within Vosk #91

Ignore unicode error within Vosk #91

KJ7LNW commented Feb 21, 2023

KJ7LNW commented Feb 22, 2023

ideasman42 commented Feb 23, 2023

KJ7LNW commented Feb 23, 2023 •

edited

Loading

Ignore unicode error within Vosk #91

Are you sure you want to change the base?

Ignore unicode error within Vosk #91

Conversation

KJ7LNW commented Feb 21, 2023

KJ7LNW commented Feb 22, 2023

ideasman42 commented Feb 23, 2023

KJ7LNW commented Feb 23, 2023 • edited Loading

KJ7LNW commented Feb 23, 2023 •

edited

Loading