Skip to content

feat(media-mix): add support for weighted multimodal request archetypes#938

Open
matthewkotila wants to merge 18 commits into
mainfrom
mkotila/media-mix
Open

feat(media-mix): add support for weighted multimodal request archetypes#938
matthewkotila wants to merge 18 commits into
mainfrom
mkotila/media-mix

Conversation

@matthewkotila
Copy link
Copy Markdown
Contributor

@matthewkotila matthewkotila commented May 14, 2026

Media Mix — weighted multimodal request archetypes

Summary

This branch adds media mix: a way to specify weighted request archetypes (image-only, audio-only, text-only, multi-modal, …) inside a single AIPerf run, with per-archetype metrics broken out alongside the aggregate. Each archetype defines which modalities appear, with what dimensional profiles, and (optionally) with overridden ISL/OSL. Output flows through the existing exporters with a new per-archetype section in the JSON and a per-archetype table in the console.

It is the v1 cut described in the AIP-814 design doc — synthetic-only, no image-pool / reuse-rate, no per-archetype × per-timeslice cross-product.

Why

Today's synthetic generator produces one kind of request per run, so realistic multimodal workloads ("70% text chats + 20% small images + 10% large images") need multiple runs and manual stitching. The per-modality latency story is also lost in the aggregate. Media mix gives users one config that:

  1. samples requests across archetypes by weight,
  2. reports per-archetype metrics so contention is visible per shape,
  3. composes with the existing goodput / streaming / fixed-schedule machinery without special-casing.

Configuration shape

input:
  media_mix:
    - weight: 0.7
      name: text-heavy
      text:
        input_tokens: { mean: 2000, stddev: 100 }
        output_tokens: { mean: 200 }
      modalities: []

    - weight: 0.3
      name: image-heavy
      modalities:
        - modality: image
          batch_size: 1
          profiles:
            - weight: 1.0
              width:  { mean: 1024, stddev: 128 }
              height: { mean: 768,  stddev: 96  }
              format: jpeg

Each archetype carries an optional text: block — false to disable text, a TextOverrideConfig to override ISL/OSL just for this archetype, or None/True to inherit the global prompt config. modalities is a list of weighted profiles per modality; batch_size controls items-per-request.

Architecture

flowchart LR
    A[YAML media_mix] --> B[InputConfig.validate_*]
    B --> C[SyntheticDatasetComposer]
    C --> D[MediaMixResolver.resolve_turn]
    D --> E[Turn with archetype_name]
    E --> F[Worker → Records]
    F --> G[MetricRecordMetadata.archetype_name]
    G --> H[MetricResultsProcessor]
    G --> I[ArchetypeMetricResultsProcessor]
    H --> J[ProfileResults.records]
    I --> K[ProfileResults.archetype_metric_results]
    J --> L[Aggregate console / JSON / CSV]
    K --> M[Per-archetype console / JSON / CSV]
Loading

Three planes

  1. Dataset plane. MediaMixResolver samples an archetype by weight, then samples one profile per modality entry. Per-archetype text overrides are surfaced as a ResolvedTurn so the existing prompt generator path picks them up.
  2. Records plane. MetricRecordMetadata gains an archetype_name field. The single field is the only cross-plane plumbing; everything else falls out of it.
  3. Metrics plane. A new ArchetypeMetricResultsProcessor mirrors TimesliceMetricResultsProcessor — same template, same goodput handling, just keyed on archetype name instead of timeslice index.

Dispatch by result_kind

The records manager previously dispatched processor outputs by Python type — list → records, dict → timeslice. That collides the moment a second dict-returning processor exists (archetype's dict[str, ...] vs timeslice's dict[int, ...]). Refactored to a result_kind: ClassVar[str] discriminator on the processor base class. Each subclass declares its kind ("records", "timeslice", "archetype"); the manager routes by string match. Future processors slot in by setting one field.

Schema 1.2

profile_export_aiperf.json bumps to version: 1.2, additive only. The new archetypes array appears alongside the existing top-level metric block; consumers that ignore unknown fields are unaffected. ArchetypeData uses extra="allow" to carry dynamic per-metric fields the same way JsonExportData does.

CLI surface

Media mix is YAML-only. There is no --media-mix flag; the nested shape (archetypes → modalities → profiles, with optional text overrides) does not fit a flat CLI ergonomic, and a image:0.6,audio:0.4-style shorthand would re-invent half the schema without expressing per-archetype overrides at all.

The entry point is aiperf profile --user-config-file <path>, which loads the full UserConfig from YAML or JSON. Individual CLI flags can be combined with the file:

  • CLI flags override global YAML fields (same scope: --url, --isl, --concurrency, etc.).
  • CLI flags do not override per-archetype YAML overrides — media_mix[i].text.input_tokens.mean is a strictly finer scope than --isl, and no CLI flag targets it. This is implemented by a recursive model_fields_set walk: only fields the user explicitly typed propagate from the CLI-built UserConfig onto the YAML-built one.
# Pure CLI
aiperf profile --model X --url localhost:8000 --concurrency 4

# Pure YAML
aiperf profile --user-config-file media-mix.yaml

# YAML baseline + CLI overrides for global fields
aiperf profile --user-config-file media-mix.yaml --url localhost:8001

Output

Aggregate output is unchanged — the existing console table, CSV, and JSON top-level metric block all render byte-for-byte identical when media_mix is absent or contains a single archetype.

When media mix is configured:

  • Console: one extra Rich table per archetype, sorted by name, with a ({normalized_share}% of traffic) title suffix.
  • JSON: an archetypes array sorted by name, each entry carrying archetype_name, archetype_weight (raw, as configured), and the same dynamic per-metric fields the top-level block uses.
  • CSV: per-archetype rows alongside the aggregate, sorted by name.

All three exporters self-disable when archetype_metric_results is missing, so non-media-mix runs see zero behavioral change.

Validation

  • weight: gt=0 on every weighted field (archetype, profile).
  • min_length=1 on profiles.
  • Profile types must match the declared modality (imageImageProfileConfig, etc.).
  • Archetype must produce something: text=False plus empty modalities is rejected.
  • Unnamed archetypes auto-assigned _archetype_0, _archetype_1, …; duplicate names rejected (would silently merge metric buckets).
  • media_mix combined with --public-dataset or --custom-dataset-type is rejected at config load. Without this guard, custom-loaded conversations (no archetype_name) reach the archetype processor and the benchmark hangs forever at PROFILING.

Test coverage

  • tests/unit/common/config/test_media_mix_config.py — config validation (profile/modality/archetype shape, weights, name uniqueness).
  • tests/unit/dataset/composer/test_media_mix.py — resolver weighting, batch-size preservation, text-override propagation, integration through SyntheticDatasetComposer.
  • tests/unit/post_processors/test_archetype_metric_results_processor.py — per-archetype grouping, goodput, error pass-through.
  • tests/unit/records/test_records_manager.py::TestRecordsManagerDispatchByResultKind — guard against future processors collapsing into the wrong bucket.
  • tests/unit/exporters/*_archetype_* — sort order, weight normalization, self-disable on absent results.

Full suite: 9832 unit tests passing locally (one pre-existing OTel-fanout pickling failure unrelated to this branch).

Tutorial

docs/tutorials/media-mix.md walks through the YAML shape, the precedence model (CLI vs YAML for global vs per-archetype fields), and how to read the per-archetype output.


Notes / learnings along the way

A few things I would have planned differently from the start, captured here so the next iteration doesn't redo the same work:

The --media-mix shorthand was wishful. The original plan included a image:0.6,audio:0.4 CLI shorthand "for quick experimentation." In practice Cyclopts destructures --media-mix "image:0.6" into media_mix[0] = "image:0.6" and Pydantic rejects the string per-element before any model-validator could parse it. The shorthand parser was wired into a model_validator that never fires. Removed entirely (~200 lines of dead code) — there's no version of the shorthand that expresses per-archetype text overrides, so it would have been a permanent half-feature even if Cyclopts cooperated.

--user-config-file was missing on aiperf profile. The plan assumed YAML was the entry point, but no one wired up the flag to load it. aiperf service --user-config-file existed; aiperf profile did not. The tutorial pointed at a non-existent flag. Fixed by mirroring the service pattern plus a model_fields_set-based merge so CLI flags can override global YAML fields without trampling per-archetype overrides.

"CLI overrides config" needs a scope qualifier. The conventional rule is "CLI wins, config loses." For media mix that framing is wrong: media_mix[i].text.input_tokens.mean: 2000 is a finer scope than --isl 100, not a different source for the same value. The user wrote the per-archetype override because they wanted to deviate from the global. The correct rule is "more specific scope wins, regardless of source" — and the implementation gets it for free because no CLI flag targets media_mix[] at all.

Type-based dispatch doesn't scale to two dict-returning processors. The records manager originally routed summarize() results by Python type (list → records, dict → timeslice). Adding archetype results (also a dict) silently collided with timeslice output. Refactored to a result_kind: ClassVar[str] discriminator on the processor base class. Cheap to add, future-proof.

Weights need normalization at the display layer, not the config layer. Users naturally write weight: 3 / weight: 7 and expect "30% / 70%". The resolver was doing this correctly (_sample_weighted divides by sum(weights)), but the console exporter was displaying weight * 100 and printing "300% / 700% of traffic." The resolver doesn't need to normalize the weights — that would just hide the user's input — but anything that displays a percentage does.

Insertion-order iteration is non-deterministic across runs. The JSON exporter iterated archetype_results.items(), which is the order the first record per archetype arrived at the processor — different across runs, and different from the CSV exporter's sorted(keys()). Same record set, different output ordering. Fixed by sorting in both places.

Validation that prevents a hang is worth more than its line count. media_mix plus custom_dataset_type was silently accepted at config load. Custom-loaded turns have no archetype_name, so the first record reaching the archetype processor raised ValueError from a ZMQ pull-client task. The exception was logged as "Task exception was never retrieved" but never propagated. The benchmark logged PROFILING forever and required SIGKILL. Eight lines in validate_dataset_type turn that into a clean config-load error.

What's deliberately out of scope

  • Image pool / reuse rate (tracked separately).
  • Per-archetype × per-timeslice cross-product grouping.
  • Real VLM server validation. The mock server does not differentiate latency by modality, so the diagnostic-value claim of per-archetype metrics under contention is taken on faith for v1 and will need a real VLM benchmark before the next iteration.
  • Cancellation-path behavior under media mix.

…types

AIPerf's existing multimodal benchmarking is all-or-nothing: if images are
enabled, every request gets images. This commit adds a media_mix config
that defines weighted request archetypes, so a single benchmark can model
realistic mixed-modality traffic (e.g., 60% image-and-audio, 30%
video-analysis, 10% text-only) with per-modality dimensional variation.

New src/aiperf/common/config/media_mix_config.py:
- MediaMixArchetype, ModalityEntry, and per-modality profile configs
  (ImageProfileConfig, AudioProfileConfig, VideoProfileConfig)
- TextOverrideConfig for per-archetype ISL/OSL overrides; unspecified
  fields fall back to the global PromptConfig
- parse_media_mix() for CLI shorthand like "image:0.6,video:0.4"

New src/aiperf/dataset/composer/media_mix_resolver.py:
- ResolvedTurn dataclass carrying per-turn generator selections
- MediaMixResolver pre-creates per-(archetype, profile) generators with
  unique RNG namespaces, then on each turn samples an archetype by
  weight, a profile per modality by weight, and returns the resolved
  generators for SyntheticDatasetComposer to invoke

Generator changes:
- ImageGenerator/AudioGenerator/VideoGenerator gain optional
  rng_namespace param to keep per-profile RNG streams independent

InputConfig:
- New media_mix field with model_validator(mode="before") that parses
  the shorthand string and inflates {modality, weight} sentinels into
  full archetype dicts using the sibling image/audio/video config

SyntheticDatasetComposer._create_turn dispatches to a new
_create_media_mix_turn when the resolver is present, with helper
methods for populating per-modality payloads and applying turn delay
and resolved sequence-length overrides.

Tests cover config validation, shorthand parsing, resolver sampling
distribution, profile-modality matching, per-archetype text overrides,
and the composer integration end to end.

Signed-off-by: Matthew Kotila <[email protected]>
…lumbing

- Replace stale `_CLI_GROUP` reference (removed from InputConfig in
  cyclopts 3.14 fix #878/#879) with `Groups.INPUT` for the new
  --media-mix field. Was causing import-time NameError.
- Extract shorthand inflation helpers from input_config.py to
  media_mix_config.py to satisfy check-ergonomics (file size <500 lines)
  and check-ruff-baselined (function complexity <=10).
- Extract per-modality population + delay + sequence-length caching from
  _create_media_mix_turn into helper methods to satisfy
  check-ruff-baselined (complexity <=10).
- Add archetype_name to ResolvedTurn, Turn, and MetricRecordMetadata
  so per-archetype metrics can be grouped during reporting (Step 7a
  foundation).

All 8961 unit tests pass, all pre-commit hooks pass.

Signed-off-by: Matthew Kotila <[email protected]>
Per-archetype metrics processing uses MediaMixArchetype.name as the dict
key for grouping records. Two issues that this validator fixes:

1. Multiple unnamed archetypes (name: None) would all collide under the
   same key, conflating distinct request types.
2. Two archetypes intentionally given the same name would silently merge,
   producing meaningless per-archetype output.

Add an InputConfig model_validator(mode="after") that runs after media_mix
shorthand inflation:
- Auto-assigns _archetype_{i} to any archetype with name=None
- Rejects remaining duplicate names with a clear error

This guarantees every archetype has a unique non-None name by the time
the resolver and the upcoming archetype results processor see it.

Tests in tests/unit/common/config/test_media_mix_config.py.

Signed-off-by: Matthew Kotila <[email protected]>
Two data-model additions for upcoming per-archetype metrics (media mix):

ProfileResults: new archetype_metric_results field carrying
dict[str, list[MetricResult]] alongside the existing records and
timeslice_metric_results. Each key is a MediaMixArchetype.name; each
value is the list-of-MetricResult shape used by the aggregate records.

JsonExportData: new ArchetypeData class with extra=allow so dynamic
metric fields can be populated at runtime via setattr (same pattern as
JsonExportData itself). archetypes: list[ArchetypeData] | None added to
JsonExportData. SCHEMA_VERSION bumped 1.1 -> 1.2.

Cross-referencing the full archetype config from output JSON is done by
joining archetypes[i].archetype_name against input_config.input.media_mix[]
(which is already serialized in the export today).

Update test_metrics_json_exporter test assertion for the new version.

Signed-off-by: Matthew Kotila <[email protected]>
…ricRecordMetadata

The base MetricResultsProcessor.get_instances_map / get_results methods
previously took request_start_ns, which implicitly assumed the only
grouping dimension was timeslice. Generalizing the parameter to the
full MetricRecordMetadata lets subclasses extract whatever grouping
key they need (timeslice index today, archetype name in an upcoming
commit, others in future) without further base-class changes.

TimesliceMetricResultsProcessor pulls request_start_ns out of the
metadata as before; behavior is identical.

Updates the corresponding tests to construct MetricRecordMetadata
instead of bare ints when calling get_instances_map / get_results
directly.

Signed-off-by: Matthew Kotila <[email protected]>
The existing RecordsManager._process_results dispatch used isinstance
on the summarize() return value: list -> records, dict -> timeslice.
That collides as soon as a second dict-returning processor (like the
upcoming ArchetypeMetricResultsProcessor with dict[str, list]) is
plugged in, since both subclasses' returns are Python dicts.

Replace with a class-attribute discriminator:
- MetricResultsProcessor.result_kind = 'records'
- TimesliceMetricResultsProcessor.result_kind = 'timeslice'
- (next commit will add 'archetype' kind)

RecordsManager wraps each summarize() call to return (kind, payload)
and routes the payload by kind. Unknown kinds are logged and dropped,
not silently merged into an existing bucket.

Tests in tests/unit/records/test_records_manager.py cover the kind
declarations and confirm subclasses must explicitly override.

Signed-off-by: Matthew Kotila <[email protected]>
Per-archetype metric aggregation for media mix benchmarks. Groups
incoming MetricRecordsData by metadata.archetype_name (set by the
SyntheticDatasetComposer during dataset generation) and computes
metrics independently per archetype, mirroring how the timeslice
processor groups by time-window index.

Architecture mirrors TimesliceMetricResultsProcessor:
- defaultdict[str, dict[MetricTagT, BaseMetric]] for per-archetype
  metric instances (auto-allocated on first record)
- defaultdict[str, MetricResultsDict] for per-archetype results
- Overrides get_instances_map/get_results to route by archetype_name
- summarize() returns dict[str, list[MetricResult]]; the
  RecordsManager dispatches it via result_kind='archetype'

Self-disables when InputConfig.media_mix is unconfigured, so users
running non-media-mix benchmarks see no behavioral change.

Registered in plugins.yaml as results_processor.archetype.

Baseline regenerated to include the BLE001 entry mirroring the
existing pattern in TimesliceMetricResultsProcessor.update_derived_metrics.

Tests in tests/unit/post_processors/test_archetype_metric_results_processor.py
cover the self-disable, the per-archetype separation, the summarize
shape, and synthetic _archetype_{i} naming.

Signed-off-by: Matthew Kotila <[email protected]>
Extend the RecordsManager dispatch loop to route 'archetype' kind
payloads into a new archetype_metric_results bucket, then pass it
into ProfileResults at the end of the run.

Falls back to None (rather than an empty dict) when no archetype
processor ran, keeping the JSON output free of an empty 'archetypes'
section for non-media-mix benchmarks.

The dispatch loop got extracted into _dispatch_processor_outcomes
to keep _process_results within the branch budget enforced by
check-ruff-baselined.

Signed-off-by: Matthew Kotila <[email protected]>
When ProfileResults.archetype_metric_results is populated (media mix
mode), MetricsJsonExporter emits an additional 'archetypes' array
alongside the existing top-level aggregate metrics. Each entry carries
the archetype's identity (archetype_name + archetype_weight) plus the
same dynamic metric fields the top level uses, via extra=allow.

Non-media-mix benchmarks see no change: getattr+exclude_none keep the
array out of the output entirely.

Cross-referencing the full archetype config (profiles, dimensions,
formats) is done by joining archetypes[i].archetype_name against
input_config.input.media_mix[] which is already in the export.

Tests added: archetypes array populated correctly, and archetypes
field is absent when media mix is unconfigured.

Signed-off-by: Matthew Kotila <[email protected]>
Tidy/long-format CSV export of per-archetype metrics for media mix
benchmarks. Schema: Archetype,Metric,Unit,Stat,Value, one row per
(archetype, metric, stat) tuple. Optimal input format for downstream
pandas/Tableau/ggplot analysis.

Mirrors TimesliceMetricsCsvExporter's shape and conventions exactly.

Self-disables when ProfileResults.archetype_metric_results is None,
so non-media-mix benchmarks don't get an empty CSV file.

New default file path: profile_export_aiperf_archetypes.csv,
matching the _timeslices.csv naming convention.

Registered in plugins.yaml as data_exporter.archetype_csv. The
existing --profile-export-prefix suffix-stripping list now recognizes
_archetypes.csv so custom prefixes work cleanly.

Signed-off-by: Matthew Kotila <[email protected]>
Renders one Rich table per archetype for media mix benchmarks. Sits
alongside (not replacing) the existing ConsoleMetricsExporter which
still renders the across-archetype aggregate table.

Each archetype's table uses the same column set, metric ordering, and
formatting as the aggregate so users learn one table layout and see
it N+1 times. Table title carries the archetype name and its
configured traffic share, e.g.:

  NVIDIA AIPerf | LLM Metrics: image-only (40% of traffic)

Inherits from ConsoleMetricsExporter to reuse the table-building,
flag-filtering, sorting, and row-formatting logic. The export()
method drives the per-archetype loop directly.

Self-disables when ProfileResults.archetype_metric_results is missing
so users running non-media-mix benchmarks see no behavioral change.

Registered in plugins.yaml as console_exporter.archetype_metrics.

Signed-off-by: Matthew Kotila <[email protected]>
- New docs/tutorials/media-mix.md walks through YAML config, weighted
  archetypes, profile/batch_size distributions, per-archetype text
  overrides, archetype naming rules, and how to read the per-archetype
  output in console/JSON/CSV. Linked from README's Endpoint Types
  tutorial index.

- docs/reference/json-export-schema.md documents schema 1.2 (added
  archetypes array) and shows the join pattern for cross-referencing
  per-archetype metric blocks against input_config.input.media_mix.

Signed-off-by: Matthew Kotila <[email protected]>
…inery

The --media-mix CLI flag was non-functional: Cyclopts treats the string value
as a list element to be coerced to MediaMixArchetype, never invoking
parse_media_mix(). Per team direction, the shorthand was a 'future enhancement'
in the original plan; a broken flag is worse than no flag.

Remove:
- CLIParameter on the media_mix field
- parse_media_mix(), normalize_media_mix_input(), _as_dict()
- _build_image/audio/video_modality_entry() and _MODALITY_BUILDERS
- inflate_shorthand_archetypes(), _inflate_shorthand_entry(), is_shorthand_list()
- VALID_MODALITIES constant
- InputConfig.inflate_media_mix_shorthand model_validator
- TestParseMediaMix class
- test_media_mix_shorthand_inflation in the composer tests
- 'CLI shorthand: ...' wording from the field description

The media_mix field stays — YAML continues to populate it directly via Pydantic.
The name-uniqueness validator and per-archetype text override logic are untouched.

Follow-up needed: aiperf profile doesn't currently expose --user-config-file, so
this commit leaves YAML configs reachable only via the verbose 'aiperf service
--type system_controller' path. Adding --user-config-file to profile is the next
commit.

Signed-off-by: Matthew Kotila <[email protected]>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Warning

Rate limit exceeded

@matthewkotila has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 33 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a34ad591-e5ff-45cf-b69b-4f0dfd978278

📥 Commits

Reviewing files that changed from the base of the PR and between 837bda0 and b6d2a8d.

📒 Files selected for processing (37)
  • README.md
  • docs/cli-options.md
  • docs/reference/json-export-schema.md
  • docs/tutorials/media-mix.md
  • src/aiperf/cli_commands/profile.py
  • src/aiperf/common/config/__init__.py
  • src/aiperf/common/config/config_defaults.py
  • src/aiperf/common/config/input_config.py
  • src/aiperf/common/config/media_mix_config.py
  • src/aiperf/common/config/output_config.py
  • src/aiperf/common/models/dataset_models.py
  • src/aiperf/common/models/export_models.py
  • src/aiperf/common/models/record_models.py
  • src/aiperf/dataset/composer/media_mix_resolver.py
  • src/aiperf/dataset/composer/synthetic.py
  • src/aiperf/dataset/generator/audio.py
  • src/aiperf/dataset/generator/image.py
  • src/aiperf/dataset/generator/video.py
  • src/aiperf/exporters/archetype_metrics_csv_exporter.py
  • src/aiperf/exporters/console_archetype_metrics_exporter.py
  • src/aiperf/exporters/metrics_json_exporter.py
  • src/aiperf/plugin/enums.py
  • src/aiperf/plugin/plugins.yaml
  • src/aiperf/post_processors/archetype_metric_results_processor.py
  • src/aiperf/post_processors/metric_results_processor.py
  • src/aiperf/post_processors/timeslice_metric_results_processor.py
  • src/aiperf/records/record_processor_service.py
  • src/aiperf/records/records_manager.py
  • tests/unit/common/config/test_media_mix_config.py
  • tests/unit/dataset/composer/test_media_mix.py
  • tests/unit/exporters/test_archetype_metrics_csv_exporter.py
  • tests/unit/exporters/test_console_archetype_metrics_exporter.py
  • tests/unit/exporters/test_metrics_json_exporter.py
  • tests/unit/post_processors/test_archetype_metric_results_processor.py
  • tests/unit/post_processors/test_timeslice_metric_results_processor.py
  • tests/unit/records/test_records_manager.py
  • tools/ruff_baseline.json

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@b6d2a8deea9a04afe77d234be3ea01d941890bca

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@b6d2a8deea9a04afe77d234be3ea01d941890bca

Last updated for commit: b6d2a8dBrowse code

@github-actions
Copy link
Copy Markdown

@matthewkotila matthewkotila changed the title Mkotila/media mix (feat): add support for media mix May 14, 2026
CLI never touched stay as the YAML loaded them — most importantly the
`media_mix` array, since no CLI flag targets it.
"""
_overlay(base, cli)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI/YAML merge mutates an already-validated UserConfig and returns it without re-running model validators, so individually valid file and CLI options can combine into an invalid final config. Fix: Rebuild and validate a fresh UserConfig from the merged data before running the controller, or merge raw dictionaries before constructing UserConfig.

raise ValueError(
"The --public-dataset and --custom-dataset-type options cannot be set together"
)
if self.media_mix and (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

media_mix still allows input.file, so DatasetManager will use the custom composer and produce records without archetype_name, causing ArchetypeMetricResultsProcessor to fail at runtime. Fix: Treat self.file as a non-synthetic dataset source and reject it whenever media_mix is configured.

if resolved.input_tokens_mean is not None
else self.config.input.prompt.input_tokens.mean
)
self._turn_sequence_cache[id(turn)] = (isl, resolved.output_tokens_mean)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-archetype output_tokens overrides are cached here, but _set_max_tokens only reads that cache when sequence_distribution is active, so normal media_mix runs ignore the documented OSL override. Fix: Have _set_max_tokens consult the per-turn cache before falling back to global output tokens, or set turn.max_tokens directly from the resolved override.

modality: Literal["image", "audio", "video"] = Field(
description="Media type: image, audio, or video.",
)
batch_size: int = Field(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch_size is declared as a fixed int with ge=1, so the documented distribution form and min: 0 optional-media case fail validation. Fix: Add a batch-size distribution config, sample it in MediaMixResolver, and allow zero when configured.

@matthewkotila matthewkotila changed the title (feat): add support for media mix feat(media-mix): add support for weighted multimodal request archetypes May 14, 2026
@github-actions github-actions Bot added the feat label May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants