Skip to content

Conversation

@BloodAxe
Copy link
Contributor

@BloodAxe BloodAxe commented Sep 30, 2025

Allow benchmarking models using random-mm dataset with video inputs

Purpose

Can now do this:

vllm bench serve \
  --backend openai-chat --endpoint /v1/chat/completions \
  --dataset-name random-mm --num-prompts 256 \
  --model nvidia/Cosmos-Reason1-7B \
  --max-concurrency 32 \
  --random-prefix-len 0 \
  --random-input-len 30 \
  --random-output-len 128 \
  --random-mm-base-items-per-request 1 \
  --random-mm-num-mm-items-range-ratio 0 \
  --random-mm-bucket-config '{(512, 512, 16): 1.0}' \
  --request-rate inf \
  --ignore-eos \
  --seed 42

@mergify mergify bot added the performance Performance-related issues label Sep 30, 2025
…hen generating random inputs (This is to avoid inserting mm-related tokens which may confuse VLM models)

Signed-off-by: Eugene Khvedchenia <[email protected]>
@mergify mergify bot added the ci/build label Oct 1, 2025
Signed-off-by: Eugene Khvedchenia <[email protected]>
@BloodAxe BloodAxe marked this pull request as ready for review October 3, 2025 18:46
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Signed-off-by: Eugene Khvedchenia <[email protected]>
Signed-off-by: Eugene Khvedchenia <[email protected]>
Signed-off-by: Eugene Khvedchenia <[email protected]>
Signed-off-by: Eugene Khvedchenia <[email protected]>
@tomasruizt
Copy link
Contributor

I see you generate a temporary mp4 file, dump the video into it, read into bytes, and then send it in base64 encodings. I suspect that passing the reference to the temporary file in the payload rather than base64 encoding would speed up inference. By building on top of your code we could easily answer this hypothesis with hard facts 👍

@BloodAxe
Copy link
Contributor Author

BloodAxe commented Oct 6, 2025

I see you generate a temporary mp4 file, dump the video into it, read into bytes, and then send it in base64 encodings. I suspect that passing the reference to the temporary file in the payload rather than base64 encoding would speed up inference. By building on top of your code we could easily answer this hypothesis with hard facts 👍

What my action points in that regard should be?

@tomasruizt
Copy link
Contributor

I don't think you need to change this PR to enable the comparison I mentioned. Its only a potential follow-up.

@ywang96
Copy link
Member

ywang96 commented Oct 16, 2025

I'm going to turn on ready label so that you can see if the benchmark tests pass.

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build performance Performance-related issues ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants