Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore unicode error within Vosk #91

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KJ7LNW
Copy link
Contributor

@KJ7LNW KJ7LNW commented Feb 21, 2023

During dictation, Vosk returned the error below. The cause is unclear, and I cannot reproduce it, but it is a simple solution to ignore this type of decoding error with a warning.

Traceback (most recent call last):
  File "./nerd-dictation", line 1962, in <module>
    main()
  File "./nerd-dictation", line 1958, in main
    args.func(args)
  File "./nerd-dictation", line 1845, in <lambda>
    vosk_grammar_file=args.vosk_grammar_file,
  File "./nerd-dictation", line 1440, in main_begin
    vosk_grammar_file=vosk_grammar_file,
  File "./nerd-dictation", line 1215, in text_from_vosk_pipe
    json_text = rec_handle_fn_wrapper_from_final_result()
  File "./nerd-dictation", line 1054, in rec_handle_fn_wrapper_from_final_result
    json_text = rec.FinalResult()
  File "/usr/src/nerd-dictation/lib64/python3.6/site-packages/vosk/__init__.py", line 194, in FinalResult
    return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

During dictation, Vosk returned the error below. The cause is unclear, and I
cannot reproduce it, but it is a simple solution to ignore this type of
decoding error with a warning.

Traceback (most recent call last):
  File "./nerd-dictation", line 1962, in <module>
    main()
  File "./nerd-dictation", line 1958, in main
    args.func(args)
  File "./nerd-dictation", line 1845, in <lambda>
    vosk_grammar_file=args.vosk_grammar_file,
  File "./nerd-dictation", line 1440, in main_begin
    vosk_grammar_file=vosk_grammar_file,
  File "./nerd-dictation", line 1215, in text_from_vosk_pipe
    json_text = rec_handle_fn_wrapper_from_final_result()
  File "./nerd-dictation", line 1054, in rec_handle_fn_wrapper_from_final_result
    json_text = rec.FinalResult()
  File "/usr/src/nerd-dictation/lib64/python3.6/site-packages/vosk/__init__.py", line 194, in FinalResult
    return _ffi.string(_c.vosk_recognizer_final_result(self._handle)).decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte

Signed-off-by: Eric Wheeler <[email protected]>
@KJ7LNW KJ7LNW force-pushed the ignore-vosk-unicode-error branch from a9f5628 to 50426d2 Compare February 22, 2023 23:55
@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Feb 22, 2023

force-pushed change to print the exception as format(e) instead of str(e), which I think is more correct...

@ideasman42
Copy link
Owner

As a workaround this may be OK, coldn't this be handled on VOSK's side: as every user of the VOSK API should really not have to workaround unicode-decoding errors.

Errors could be ignored e.g.

>>> b'A\xaeB'.decode('utf-8', errors='ignore')
'AB'

@KJ7LNW
Copy link
Contributor Author

KJ7LNW commented Feb 23, 2023

You have a good point that this should be addressed in their API, not sure why I did not think of that first. I opened an issue in their repository:

For now, would you like to accept this pull request as a workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants