Skip to content

Conversation

BloodAxe
Copy link
Contributor

@BloodAxe BloodAxe commented Sep 30, 2025

Allow benchmarking models using random-mm dataset with video inputs

Purpose

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the performance Performance-related issues label Sep 30, 2025
…hen generating random inputs (This is to avoid inserting mm-related tokens which may confuse VLM models)

Signed-off-by: Eugene Khvedchenia <[email protected]>
@mergify mergify bot added the ci/build label Oct 1, 2025
Signed-off-by: Eugene Khvedchenia <[email protected]>
@BloodAxe BloodAxe marked this pull request as ready for review October 3, 2025 18:46
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +633 to +637
# Build the inner sequence by sampling
# sequentially from the allowed tokens
inner_seq = allowed_tokens[
(offset + index + np.arange(prefix_len)) % len(allowed_tokens)
].tolist()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sample prompts using prefix length instead of input length

In RandomDataset.generate_token_sequence the inner sequence is built with np.arange(prefix_len) and indexed into allowed_tokens. For the common case where prefix_len is 0, this creates an empty token_sequence, causing gen_prompt_decode_to_target_len to backfill almost the entire prompt with random tokens from the full vocabulary, reintroducing special tokens and ignoring the supplied input_len and offsets. This regresses determinism and the intent of filtering out special IDs for every request with input_len > 0.

Useful? React with 👍 / 👎.

Comment on lines 57 to +60
except ImportError:
librosa = PlaceholderModule("librosa")

import cv2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unconditionally importing OpenCV makes benchmarks fail without it

vllm/benchmarks/datasets.py now imports cv2 at module import time, unlike other optional dependencies which are wrapped in try/except placeholders. Any environment that uses text-only benchmarks and does not have OpenCV installed will now hit an ImportError simply by importing vllm.benchmarks.datasets, even though video sampling might never be used. This tightens a new hard dependency that will break existing benchmark workflows unless OpenCV is installed everywhere.

Useful? React with 👍 / 👎.

Comment on lines +28 to +36
@pytest.mark.benchmark
def test_generate_synthetic_video_different_seeds(self):
"""Test that different seeds produce different videos."""
dataset1 = RandomMultiModalDataset(random_seed=123)
dataset2 = RandomMultiModalDataset(random_seed=456)

width, height, num_frames = 64, 48, 8

video1 = dataset1.generate_synthetic_video(width, height, num_frames)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Video test defines an unused self fixture and fails to collect

The new test test_generate_synthetic_video_different_seeds is a module-level function but declares a self parameter. Pytest will treat self as a fixture and raise FixtureLookupError: no fixture named 'self' during collection, so the entire test module never runs and all subsequent assertions are skipped. Removing the self argument fixes test collection.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build performance Performance-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant