-
-
Couldn't load subscription status.
- Fork 10.9k
Description
🚀 The feature, motivation and pitch
As an AI engineer, I aim to leverage vLLM to host speech-to-text models and produce audio transcriptions with timestamps(audio "segments" ). My goal is to use a Whisper model hosted on vLLM, following the OpenAI specification for transcription. I would like the output to be in the "srt" format, with word-level granularity for the timestamps.
The following OpenAI client call is executed against a Whisper model hosted via vLLM:
transcript = client.audio.transcriptions.create(
file=audio_file,
model=model_name,
response_format="srt",
timestamp_granularities=["word"]
)Expected Behavior:
vLLM returns a valid SRT output according to the OpenAI specification.
Current Output:
vLLM returns the following message:
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Currently only support response_format `text` or `json`', 'type': 'BadRequestError', 'param': None, 'code': 400}
Alternatives
No response
Additional context
more detail about srt can be found here : openai/whisper#98
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.