[Feature]: Implement SRT generation for audio transcription  in vLLM

### 🚀 The feature, motivation and pitch

As an AI engineer, I aim to leverage vLLM to host speech-to-text models and produce audio transcriptions with timestamps(audio "segments" ). My goal is to use a Whisper model hosted on vLLM, following the [OpenAI specification for transcription](https://platform.openai.com/docs/api-reference/audio/createTranscription). I would like the output to be in the "srt" format, with word-level granularity for the timestamps.

The following OpenAI client call is executed against a Whisper model hosted via vLLM:

```python
transcript = client.audio.transcriptions.create(
    file=audio_file,
    model=model_name,
    response_format="srt",
    timestamp_granularities=["word"]
)
``` 

Expected Behavior: 
vLLM returns a valid SRT output according to the [OpenAI specification.](https://platform.openai.com/docs/api-reference/audio/createTranscription)

Current Output:
vLLM returns the following message:

```shell
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Currently only support response_format `text` or `json`', 'type': 'BadRequestError', 'param': None, 'code': 400}

``` 


### Alternatives

_No response_

### Additional context

more detail about srt can be found here : https://github.com/openai/whisper/discussions/98

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[Feature]: Implement SRT generation for audio transcription in vLLM #24302

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

[Feature]: Implement SRT generation for audio transcription in vLLM #24302

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions