-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
audio/wave
.wav
files not supported
#603
Comments
Yup, that's a bug - thanks. You can workaround it with the llm -m gemini-1.5-flash-latest --at output.wav audio/wav transcribe Thanks for the tip about brew install sox
sox -d output.wav
# Hit Ctrl+C when done |
It looks like |
puremagic uses data from https://www.garykessler.net/library/file_sigs.html - it lists two byte sequences for WAV The first of those matches the puremagic definition of |
Interesting, the
Which is BOTH of the lines in the In which case, why does |
This file in the That's one of four audio files in the tests https://github.com/cdgriffith/puremagic/tree/master/test/resources/audio - and the only assertion it runs is that the file extension |
Filed an issue here: But seeing as IANA doesn't list either |
Also relevant: python -c 'import puremagic, pprint, sys; pprint.pprint(puremagic.magic_stream(open(sys.argv[-1], "rb")))' output.wav [PureMagicWithConfidence(byte_match=b'RIFFH\xe0\x02\x00WAVE', offset=8, extension='.wav', mime_type='audio/wave', name='Waveform Audio File Format', confidence=0.8),
PureMagicWithConfidence(byte_match=b'WAVEfmt ', offset=8, extension='.wav', mime_type='audio/x-wav', name='Windows audio file ', confidence=0.8),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.4xm', mime_type='', name='4X Movie video', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cdr', mime_type='', name='CorelDraw document', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.avi', mime_type='video/avi', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cda', mime_type='', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.qcp', mime_type='audio/vnd.qcelp', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.rmi', mime_type='audio/mid', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.wav', mime_type='audio/wav', name='Resource Interchange File Format', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ds4', mime_type='', name='Micrografx Designer graphic', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.ani', mime_type='application/x-navi-animation', name='Windows animated cursor', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.dat', mime_type='video/mpeg', name='Video CD MPEG movie', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.cmx', mime_type='', name='Corel Presentation Exchange metadata', confidence=0.4),
PureMagicWithConfidence(byte_match=b'RIFF', offset=0, extension='.webp', mime_type='image/webp', name='RIFF WebP', confidence=0.4),
PureMagicWithConfidence(byte_match=b'WAVE', offset=8, extension='.wav', mime_type='audio/x-wav', name='WAV audio', confidence=0.4)] |
For the moment I'm going to take the opinion that |
This works: llm -m gemini-1.5-flash-latest -a output.wav transcribe
|
Thanks! ❤️ So |
Each plugin defines the list of accepted mime type like this: llm/llm/default_plugins/openai_models.py Lines 315 to 333 in 5d1d723
Full docs here: https://llm.datasette.io/en/stable/plugins/advanced-model-plugins.html#attachments-for-multi-modal-models |
…els (#613) - #507 (comment) * register_model is now async aware Refs #507 (comment) * Refactor Chat and AsyncChat to use _Shared base class Refs #507 (comment) * fixed function name * Fix for infinite loop * Applied Black * Ran cog * Applied Black * Add Response.from_row() classmethod back again It does not matter that this is a blocking call, since it is a classmethod * Made mypy happy with llm/models.py * mypy fixes for openai_models.py I am unhappy with this, had to duplicate some code. * First test for AsyncModel * Still have not quite got this working * Fix for not loading plugins during tests, refs #626 * audio/wav not audio/wave, refs #603 * Black and mypy and ruff all happy * Refactor to avoid generics * Removed obsolete response() method * Support text = await async_mock_model.prompt("hello") * Initial docs for llm.get_async_model() and await model.prompt() Refs #507 * Initial async model plugin creation docs * duration_ms ANY to pass test * llm models --async option Refs #613 (comment) * Removed obsolete TypeVars * Expanded register_models() docs for async * await model.prompt() now returns AsyncResponse Refs #613 (comment) --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
I'm recording audio from my microphone using
sox
and saving the recordings as.wav
files. When I try to attach these files to thegemini-1.5-flash-8b-latest
model, I receive this error:I suspect the issue is simply that
llm
doesn't recognize thataudio/wave
andaudio/wav
are actually the same MIME type. Is this correct?The text was updated successfully, but these errors were encountered: