feat(media-mix): add support for weighted multimodal request archetypes by matthewkotila · Pull Request #938 · ai-dynamo/aiperf

matthewkotila · 2026-05-14T17:56:11Z

Media Mix — weighted multimodal request archetypes

Summary

This branch adds media mix: a way to specify weighted request archetypes (image-only, audio-only, text-only, multi-modal, …) inside a single AIPerf run, with per-archetype metrics broken out alongside the aggregate. Each archetype defines which modalities appear, with what dimensional profiles, and (optionally) with overridden ISL/OSL. Output flows through the existing exporters with a new per-archetype section in the JSON and a per-archetype table in the console.

It is the v1 cut described in the AIP-814 design doc — synthetic-only, no image-pool / reuse-rate, no per-archetype × per-timeslice cross-product.

Why

Today's synthetic generator produces one kind of request per run, so realistic multimodal workloads ("70% text chats + 20% small images + 10% large images") need multiple runs and manual stitching. The per-modality latency story is also lost in the aggregate. Media mix gives users one config that:

samples requests across archetypes by weight,
reports per-archetype metrics so contention is visible per shape,
composes with the existing goodput / streaming / fixed-schedule machinery without special-casing.

Configuration shape

input:
  media_mix:
    - weight: 0.7
      name: text-heavy
      text:
        input_tokens: { mean: 2000, stddev: 100 }
        output_tokens: { mean: 200 }
      modalities: []

    - weight: 0.3
      name: image-heavy
      modalities:
        - modality: image
          batch_size: 1
          profiles:
            - weight: 1.0
              width:  { mean: 1024, stddev: 128 }
              height: { mean: 768,  stddev: 96  }
              format: jpeg

Each archetype carries an optional text: block — false to disable text, a TextOverrideConfig to override ISL/OSL just for this archetype, or None/True to inherit the global prompt config. modalities is a list of weighted profiles per modality; batch_size controls items-per-request.

Architecture

flowchart LR
    A[YAML media_mix] --> B[InputConfig.validate_*]
    B --> C[SyntheticDatasetComposer]
    C --> D[MediaMixResolver.resolve_turn]
    D --> E[Turn with archetype_name]
    E --> F[Worker → Records]
    F --> G[MetricRecordMetadata.archetype_name]
    G --> H[MetricResultsProcessor]
    G --> I[ArchetypeMetricResultsProcessor]
    H --> J[ProfileResults.records]
    I --> K[ProfileResults.archetype_metric_results]
    J --> L[Aggregate console / JSON / CSV]
    K --> M[Per-archetype console / JSON / CSV]

Three planes

Dataset plane. MediaMixResolver samples an archetype by weight, then samples one profile per modality entry. Per-archetype text overrides are surfaced as a ResolvedTurn so the existing prompt generator path picks them up.
Records plane. MetricRecordMetadata gains an archetype_name field. The single field is the only cross-plane plumbing; everything else falls out of it.
Metrics plane. A new ArchetypeMetricResultsProcessor mirrors TimesliceMetricResultsProcessor — same template, same goodput handling, just keyed on archetype name instead of timeslice index.

Dispatch by `result_kind`

The records manager previously dispatched processor outputs by Python type — list → records, dict → timeslice. That collides the moment a second dict-returning processor exists (archetype's dict[str, ...] vs timeslice's dict[int, ...]). Refactored to a result_kind: ClassVar[str] discriminator on the processor base class. Each subclass declares its kind ("records", "timeslice", "archetype"); the manager routes by string match. Future processors slot in by setting one field.

Schema 1.2

profile_export_aiperf.json bumps to version: 1.2, additive only. The new archetypes array appears alongside the existing top-level metric block; consumers that ignore unknown fields are unaffected. ArchetypeData uses extra="allow" to carry dynamic per-metric fields the same way JsonExportData does.

CLI surface

Media mix is YAML-only. There is no --media-mix flag; the nested shape (archetypes → modalities → profiles, with optional text overrides) does not fit a flat CLI ergonomic, and a image:0.6,audio:0.4-style shorthand would re-invent half the schema without expressing per-archetype overrides at all.

The entry point is aiperf profile --user-config-file <path>, which loads the full UserConfig from YAML or JSON. Individual CLI flags can be combined with the file:

CLI flags override global YAML fields (same scope: --url, --isl, --concurrency, etc.).
CLI flags do not override per-archetype YAML overrides — media_mix[i].text.input_tokens.mean is a strictly finer scope than --isl, and no CLI flag targets it. This is implemented by a recursive model_fields_set walk: only fields the user explicitly typed propagate from the CLI-built UserConfig onto the YAML-built one.

# Pure CLI
aiperf profile --model X --url localhost:8000 --concurrency 4

# Pure YAML
aiperf profile --user-config-file media-mix.yaml

# YAML baseline + CLI overrides for global fields
aiperf profile --user-config-file media-mix.yaml --url localhost:8001

Output

Aggregate output is unchanged — the existing console table, CSV, and JSON top-level metric block all render byte-for-byte identical when media_mix is absent or contains a single archetype.

When media mix is configured:

Console: one extra Rich table per archetype, sorted by name, with a ({normalized_share}% of traffic) title suffix.
JSON: an archetypes array sorted by name, each entry carrying archetype_name, archetype_weight (raw, as configured), and the same dynamic per-metric fields the top-level block uses.
CSV: per-archetype rows alongside the aggregate, sorted by name.

All three exporters self-disable when archetype_metric_results is missing, so non-media-mix runs see zero behavioral change.

Validation

weight: gt=0 on every weighted field (archetype, profile).
min_length=1 on profiles.
Profile types must match the declared modality (image ↔ ImageProfileConfig, etc.).
Archetype must produce something: text=False plus empty modalities is rejected.
Unnamed archetypes auto-assigned _archetype_0, _archetype_1, …; duplicate names rejected (would silently merge metric buckets).
media_mix combined with --public-dataset or --custom-dataset-type is rejected at config load. Without this guard, custom-loaded conversations (no archetype_name) reach the archetype processor and the benchmark hangs forever at PROFILING.

Test coverage

tests/unit/common/config/test_media_mix_config.py — config validation (profile/modality/archetype shape, weights, name uniqueness).
tests/unit/dataset/composer/test_media_mix.py — resolver weighting, batch-size preservation, text-override propagation, integration through SyntheticDatasetComposer.
tests/unit/post_processors/test_archetype_metric_results_processor.py — per-archetype grouping, goodput, error pass-through.
tests/unit/records/test_records_manager.py::TestRecordsManagerDispatchByResultKind — guard against future processors collapsing into the wrong bucket.
tests/unit/exporters/*_archetype_* — sort order, weight normalization, self-disable on absent results.

Full suite: 9832 unit tests passing locally (one pre-existing OTel-fanout pickling failure unrelated to this branch).

Tutorial

docs/tutorials/media-mix.md walks through the YAML shape, the precedence model (CLI vs YAML for global vs per-archetype fields), and how to read the per-archetype output.

Notes / learnings along the way

A few things I would have planned differently from the start, captured here so the next iteration doesn't redo the same work:

The --media-mix shorthand was wishful. The original plan included a image:0.6,audio:0.4 CLI shorthand "for quick experimentation." In practice Cyclopts destructures --media-mix "image:0.6" into media_mix[0] = "image:0.6" and Pydantic rejects the string per-element before any model-validator could parse it. The shorthand parser was wired into a model_validator that never fires. Removed entirely (~200 lines of dead code) — there's no version of the shorthand that expresses per-archetype text overrides, so it would have been a permanent half-feature even if Cyclopts cooperated.

--user-config-file was missing on aiperf profile. The plan assumed YAML was the entry point, but no one wired up the flag to load it. aiperf service --user-config-file existed; aiperf profile did not. The tutorial pointed at a non-existent flag. Fixed by mirroring the service pattern plus a model_fields_set-based merge so CLI flags can override global YAML fields without trampling per-archetype overrides.

"CLI overrides config" needs a scope qualifier. The conventional rule is "CLI wins, config loses." For media mix that framing is wrong: media_mix[i].text.input_tokens.mean: 2000 is a finer scope than --isl 100, not a different source for the same value. The user wrote the per-archetype override because they wanted to deviate from the global. The correct rule is "more specific scope wins, regardless of source" — and the implementation gets it for free because no CLI flag targets media_mix[] at all.

Type-based dispatch doesn't scale to two dict-returning processors. The records manager originally routed summarize() results by Python type (list → records, dict → timeslice). Adding archetype results (also a dict) silently collided with timeslice output. Refactored to a result_kind: ClassVar[str] discriminator on the processor base class. Cheap to add, future-proof.

Weights need normalization at the display layer, not the config layer. Users naturally write weight: 3 / weight: 7 and expect "30% / 70%". The resolver was doing this correctly (_sample_weighted divides by sum(weights)), but the console exporter was displaying weight * 100 and printing "300% / 700% of traffic." The resolver doesn't need to normalize the weights — that would just hide the user's input — but anything that displays a percentage does.

Insertion-order iteration is non-deterministic across runs. The JSON exporter iterated archetype_results.items(), which is the order the first record per archetype arrived at the processor — different across runs, and different from the CSV exporter's sorted(keys()). Same record set, different output ordering. Fixed by sorting in both places.

Validation that prevents a hang is worth more than its line count. media_mix plus custom_dataset_type was silently accepted at config load. Custom-loaded turns have no archetype_name, so the first record reaching the archetype processor raised ValueError from a ZMQ pull-client task. The exception was logged as "Task exception was never retrieved" but never propagated. The benchmark logged PROFILING forever and required SIGKILL. Eight lines in validate_dataset_type turn that into a clean config-load error.

What's deliberately out of scope

Image pool / reuse rate (tracked separately).
Per-archetype × per-timeslice cross-product grouping.
Real VLM server validation. The mock server does not differentiate latency by modality, so the diagnostic-value claim of per-archetype metrics under contention is taken on faith for v1 and will need a real VLM benchmark before the next iteration.
Cancellation-path behavior under media mix.

…types AIPerf's existing multimodal benchmarking is all-or-nothing: if images are enabled, every request gets images. This commit adds a media_mix config that defines weighted request archetypes, so a single benchmark can model realistic mixed-modality traffic (e.g., 60% image-and-audio, 30% video-analysis, 10% text-only) with per-modality dimensional variation. New src/aiperf/common/config/media_mix_config.py: - MediaMixArchetype, ModalityEntry, and per-modality profile configs (ImageProfileConfig, AudioProfileConfig, VideoProfileConfig) - TextOverrideConfig for per-archetype ISL/OSL overrides; unspecified fields fall back to the global PromptConfig - parse_media_mix() for CLI shorthand like "image:0.6,video:0.4" New src/aiperf/dataset/composer/media_mix_resolver.py: - ResolvedTurn dataclass carrying per-turn generator selections - MediaMixResolver pre-creates per-(archetype, profile) generators with unique RNG namespaces, then on each turn samples an archetype by weight, a profile per modality by weight, and returns the resolved generators for SyntheticDatasetComposer to invoke Generator changes: - ImageGenerator/AudioGenerator/VideoGenerator gain optional rng_namespace param to keep per-profile RNG streams independent InputConfig: - New media_mix field with model_validator(mode="before") that parses the shorthand string and inflates {modality, weight} sentinels into full archetype dicts using the sibling image/audio/video config SyntheticDatasetComposer._create_turn dispatches to a new _create_media_mix_turn when the resolver is present, with helper methods for populating per-modality payloads and applying turn delay and resolved sequence-length overrides. Tests cover config validation, shorthand parsing, resolver sampling distribution, profile-modality matching, per-archetype text overrides, and the composer integration end to end. Signed-off-by: Matthew Kotila <[email protected]>

…lumbing - Replace stale `_CLI_GROUP` reference (removed from InputConfig in cyclopts 3.14 fix #878/#879) with `Groups.INPUT` for the new --media-mix field. Was causing import-time NameError. - Extract shorthand inflation helpers from input_config.py to media_mix_config.py to satisfy check-ergonomics (file size <500 lines) and check-ruff-baselined (function complexity <=10). - Extract per-modality population + delay + sequence-length caching from _create_media_mix_turn into helper methods to satisfy check-ruff-baselined (complexity <=10). - Add archetype_name to ResolvedTurn, Turn, and MetricRecordMetadata so per-archetype metrics can be grouped during reporting (Step 7a foundation). All 8961 unit tests pass, all pre-commit hooks pass. Signed-off-by: Matthew Kotila <[email protected]>

Per-archetype metrics processing uses MediaMixArchetype.name as the dict key for grouping records. Two issues that this validator fixes: 1. Multiple unnamed archetypes (name: None) would all collide under the same key, conflating distinct request types. 2. Two archetypes intentionally given the same name would silently merge, producing meaningless per-archetype output. Add an InputConfig model_validator(mode="after") that runs after media_mix shorthand inflation: - Auto-assigns _archetype_{i} to any archetype with name=None - Rejects remaining duplicate names with a clear error This guarantees every archetype has a unique non-None name by the time the resolver and the upcoming archetype results processor see it. Tests in tests/unit/common/config/test_media_mix_config.py. Signed-off-by: Matthew Kotila <[email protected]>

Two data-model additions for upcoming per-archetype metrics (media mix): ProfileResults: new archetype_metric_results field carrying dict[str, list[MetricResult]] alongside the existing records and timeslice_metric_results. Each key is a MediaMixArchetype.name; each value is the list-of-MetricResult shape used by the aggregate records. JsonExportData: new ArchetypeData class with extra=allow so dynamic metric fields can be populated at runtime via setattr (same pattern as JsonExportData itself). archetypes: list[ArchetypeData] | None added to JsonExportData. SCHEMA_VERSION bumped 1.1 -> 1.2. Cross-referencing the full archetype config from output JSON is done by joining archetypes[i].archetype_name against input_config.input.media_mix[] (which is already serialized in the export today). Update test_metrics_json_exporter test assertion for the new version. Signed-off-by: Matthew Kotila <[email protected]>

…ricRecordMetadata The base MetricResultsProcessor.get_instances_map / get_results methods previously took request_start_ns, which implicitly assumed the only grouping dimension was timeslice. Generalizing the parameter to the full MetricRecordMetadata lets subclasses extract whatever grouping key they need (timeslice index today, archetype name in an upcoming commit, others in future) without further base-class changes. TimesliceMetricResultsProcessor pulls request_start_ns out of the metadata as before; behavior is identical. Updates the corresponding tests to construct MetricRecordMetadata instead of bare ints when calling get_instances_map / get_results directly. Signed-off-by: Matthew Kotila <[email protected]>

The existing RecordsManager._process_results dispatch used isinstance on the summarize() return value: list -> records, dict -> timeslice. That collides as soon as a second dict-returning processor (like the upcoming ArchetypeMetricResultsProcessor with dict[str, list]) is plugged in, since both subclasses' returns are Python dicts. Replace with a class-attribute discriminator: - MetricResultsProcessor.result_kind = 'records' - TimesliceMetricResultsProcessor.result_kind = 'timeslice' - (next commit will add 'archetype' kind) RecordsManager wraps each summarize() call to return (kind, payload) and routes the payload by kind. Unknown kinds are logged and dropped, not silently merged into an existing bucket. Tests in tests/unit/records/test_records_manager.py cover the kind declarations and confirm subclasses must explicitly override. Signed-off-by: Matthew Kotila <[email protected]>

Per-archetype metric aggregation for media mix benchmarks. Groups incoming MetricRecordsData by metadata.archetype_name (set by the SyntheticDatasetComposer during dataset generation) and computes metrics independently per archetype, mirroring how the timeslice processor groups by time-window index. Architecture mirrors TimesliceMetricResultsProcessor: - defaultdict[str, dict[MetricTagT, BaseMetric]] for per-archetype metric instances (auto-allocated on first record) - defaultdict[str, MetricResultsDict] for per-archetype results - Overrides get_instances_map/get_results to route by archetype_name - summarize() returns dict[str, list[MetricResult]]; the RecordsManager dispatches it via result_kind='archetype' Self-disables when InputConfig.media_mix is unconfigured, so users running non-media-mix benchmarks see no behavioral change. Registered in plugins.yaml as results_processor.archetype. Baseline regenerated to include the BLE001 entry mirroring the existing pattern in TimesliceMetricResultsProcessor.update_derived_metrics. Tests in tests/unit/post_processors/test_archetype_metric_results_processor.py cover the self-disable, the per-archetype separation, the summarize shape, and synthetic _archetype_{i} naming. Signed-off-by: Matthew Kotila <[email protected]>

Extend the RecordsManager dispatch loop to route 'archetype' kind payloads into a new archetype_metric_results bucket, then pass it into ProfileResults at the end of the run. Falls back to None (rather than an empty dict) when no archetype processor ran, keeping the JSON output free of an empty 'archetypes' section for non-media-mix benchmarks. The dispatch loop got extracted into _dispatch_processor_outcomes to keep _process_results within the branch budget enforced by check-ruff-baselined. Signed-off-by: Matthew Kotila <[email protected]>

When ProfileResults.archetype_metric_results is populated (media mix mode), MetricsJsonExporter emits an additional 'archetypes' array alongside the existing top-level aggregate metrics. Each entry carries the archetype's identity (archetype_name + archetype_weight) plus the same dynamic metric fields the top level uses, via extra=allow. Non-media-mix benchmarks see no change: getattr+exclude_none keep the array out of the output entirely. Cross-referencing the full archetype config (profiles, dimensions, formats) is done by joining archetypes[i].archetype_name against input_config.input.media_mix[] which is already in the export. Tests added: archetypes array populated correctly, and archetypes field is absent when media mix is unconfigured. Signed-off-by: Matthew Kotila <[email protected]>

Tidy/long-format CSV export of per-archetype metrics for media mix benchmarks. Schema: Archetype,Metric,Unit,Stat,Value, one row per (archetype, metric, stat) tuple. Optimal input format for downstream pandas/Tableau/ggplot analysis. Mirrors TimesliceMetricsCsvExporter's shape and conventions exactly. Self-disables when ProfileResults.archetype_metric_results is None, so non-media-mix benchmarks don't get an empty CSV file. New default file path: profile_export_aiperf_archetypes.csv, matching the _timeslices.csv naming convention. Registered in plugins.yaml as data_exporter.archetype_csv. The existing --profile-export-prefix suffix-stripping list now recognizes _archetypes.csv so custom prefixes work cleanly. Signed-off-by: Matthew Kotila <[email protected]>

Renders one Rich table per archetype for media mix benchmarks. Sits alongside (not replacing) the existing ConsoleMetricsExporter which still renders the across-archetype aggregate table. Each archetype's table uses the same column set, metric ordering, and formatting as the aggregate so users learn one table layout and see it N+1 times. Table title carries the archetype name and its configured traffic share, e.g.: NVIDIA AIPerf | LLM Metrics: image-only (40% of traffic) Inherits from ConsoleMetricsExporter to reuse the table-building, flag-filtering, sorting, and row-formatting logic. The export() method drives the per-archetype loop directly. Self-disables when ProfileResults.archetype_metric_results is missing so users running non-media-mix benchmarks see no behavioral change. Registered in plugins.yaml as console_exporter.archetype_metrics. Signed-off-by: Matthew Kotila <[email protected]>

- New docs/tutorials/media-mix.md walks through YAML config, weighted archetypes, profile/batch_size distributions, per-archetype text overrides, archetype naming rules, and how to read the per-archetype output in console/JSON/CSV. Linked from README's Endpoint Types tutorial index. - docs/reference/json-export-schema.md documents schema 1.2 (added archetypes array) and shows the join pattern for cross-referencing per-archetype metric blocks against input_config.input.media_mix. Signed-off-by: Matthew Kotila <[email protected]>

…inery The --media-mix CLI flag was non-functional: Cyclopts treats the string value as a list element to be coerced to MediaMixArchetype, never invoking parse_media_mix(). Per team direction, the shorthand was a 'future enhancement' in the original plan; a broken flag is worse than no flag. Remove: - CLIParameter on the media_mix field - parse_media_mix(), normalize_media_mix_input(), _as_dict() - _build_image/audio/video_modality_entry() and _MODALITY_BUILDERS - inflate_shorthand_archetypes(), _inflate_shorthand_entry(), is_shorthand_list() - VALID_MODALITIES constant - InputConfig.inflate_media_mix_shorthand model_validator - TestParseMediaMix class - test_media_mix_shorthand_inflation in the composer tests - 'CLI shorthand: ...' wording from the field description The media_mix field stays — YAML continues to populate it directly via Pydantic. The name-uniqueness validator and per-archetype text override logic are untouched. Follow-up needed: aiperf profile doesn't currently expose --user-config-file, so this commit leaves YAML configs reachable only via the verbose 'aiperf service --type system_controller' path. Adding --user-config-file to profile is the next commit. Signed-off-by: Matthew Kotila <[email protected]>

…CLI merge

…at config load

… display

…ordering

…tokens_mean

coderabbitai · 2026-05-14T17:56:19Z

Warning

Rate limit exceeded

@matthewkotila has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 36 minutes and 33 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a34ad591-e5ff-45cf-b69b-4f0dfd978278

📥 Commits

Reviewing files that changed from the base of the PR and between 837bda0 and b6d2a8d.

📒 Files selected for processing (37)

README.md
docs/cli-options.md
docs/reference/json-export-schema.md
docs/tutorials/media-mix.md
src/aiperf/cli_commands/profile.py
src/aiperf/common/config/__init__.py
src/aiperf/common/config/config_defaults.py
src/aiperf/common/config/input_config.py
src/aiperf/common/config/media_mix_config.py
src/aiperf/common/config/output_config.py
src/aiperf/common/models/dataset_models.py
src/aiperf/common/models/export_models.py
src/aiperf/common/models/record_models.py
src/aiperf/dataset/composer/media_mix_resolver.py
src/aiperf/dataset/composer/synthetic.py
src/aiperf/dataset/generator/audio.py
src/aiperf/dataset/generator/image.py
src/aiperf/dataset/generator/video.py
src/aiperf/exporters/archetype_metrics_csv_exporter.py
src/aiperf/exporters/console_archetype_metrics_exporter.py
src/aiperf/exporters/metrics_json_exporter.py
src/aiperf/plugin/enums.py
src/aiperf/plugin/plugins.yaml
src/aiperf/post_processors/archetype_metric_results_processor.py
src/aiperf/post_processors/metric_results_processor.py
src/aiperf/post_processors/timeslice_metric_results_processor.py
src/aiperf/records/record_processor_service.py
src/aiperf/records/records_manager.py
tests/unit/common/config/test_media_mix_config.py
tests/unit/dataset/composer/test_media_mix.py
tests/unit/exporters/test_archetype_metrics_csv_exporter.py
tests/unit/exporters/test_console_archetype_metrics_exporter.py
tests/unit/exporters/test_metrics_json_exporter.py
tests/unit/post_processors/test_archetype_metric_results_processor.py
tests/unit/post_processors/test_timeslice_metric_results_processor.py
tests/unit/records/test_records_manager.py
tools/ruff_baseline.json

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-14T17:56:23Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@b6d2a8deea9a04afe77d234be3ea01d941890bca

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@b6d2a8deea9a04afe77d234be3ea01d941890bca

Last updated for commit: b6d2a8d • Browse code

github-actions · 2026-05-14T17:56:52Z

Fern Docs Preview: https://nvidia-preview-2b43c0cb-1ebc-408c-a92b-bb731c023374.docs.buildwithfern.com/aiperf/dev

dynamo-ops · 2026-05-14T18:22:28Z

+    CLI never touched stay as the YAML loaded them — most importantly the
+    `media_mix` array, since no CLI flag targets it.
+    """
+    _overlay(base, cli)


The CLI/YAML merge mutates an already-validated UserConfig and returns it without re-running model validators, so individually valid file and CLI options can combine into an invalid final config. Fix: Rebuild and validate a fresh UserConfig from the merged data before running the controller, or merge raw dictionaries before constructing UserConfig.

dynamo-ops · 2026-05-14T18:22:28Z

            raise ValueError(
                "The --public-dataset and --custom-dataset-type options cannot be set together"
            )
+        if self.media_mix and (


media_mix still allows input.file, so DatasetManager will use the custom composer and produce records without archetype_name, causing ArchetypeMetricResultsProcessor to fail at runtime. Fix: Treat self.file as a non-synthetic dataset source and reject it whenever media_mix is configured.

dynamo-ops · 2026-05-14T18:22:28Z

+            if resolved.input_tokens_mean is not None
+            else self.config.input.prompt.input_tokens.mean
+        )
+        self._turn_sequence_cache[id(turn)] = (isl, resolved.output_tokens_mean)


Per-archetype output_tokens overrides are cached here, but _set_max_tokens only reads that cache when sequence_distribution is active, so normal media_mix runs ignore the documented OSL override. Fix: Have _set_max_tokens consult the per-turn cache before falling back to global output tokens, or set turn.max_tokens directly from the resolved override.

dynamo-ops · 2026-05-14T18:22:28Z

+    modality: Literal["image", "audio", "video"] = Field(
+        description="Media type: image, audio, or video.",
+    )
+    batch_size: int = Field(


batch_size is declared as a fixed int with ge=1, so the documented distribution form and min: 0 optional-media case fail validation. Fix: Add a batch-size distribution config, sample it in MediaMixResolver, and allow zero when configured.

matthewkotila added 18 commits May 14, 2026 10:19

feat(cli): add --user-config-file to aiperf profile with scope-aware …

0469aab

…CLI merge

fix(media-mix): reject media_mix combined with public/custom dataset …

5e56808

…at config load

fix(exporters): normalize archetype weights for console traffic-share…

5b833e2

… display

fix(exporters): sort JSON archetype blocks by name for deterministic …

ad29e3f

…ordering

fix(synthetic): use is-not-None fall-through for per-archetype input_…

b6d2a8d

…tokens_mean

matthewkotila requested a review from ajcasagrande May 14, 2026 17:56

matthewkotila changed the title ~~Mkotila/media mix~~ (feat): add support for media mix May 14, 2026

dynamo-ops reviewed May 14, 2026

View reviewed changes

matthewkotila changed the title ~~(feat): add support for media mix~~ feat(media-mix): add support for weighted multimodal request archetypes May 14, 2026

github-actions Bot added the feat label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(media-mix): add support for weighted multimodal request archetypes#938

feat(media-mix): add support for weighted multimodal request archetypes#938
matthewkotila wants to merge 18 commits into
mainfrom
mkotila/media-mix

matthewkotila commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026

Rate limit exceeded

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

dynamo-ops May 14, 2026

Uh oh!

dynamo-ops May 14, 2026

Uh oh!

dynamo-ops May 14, 2026

Uh oh!

dynamo-ops May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthewkotila commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Media Mix — weighted multimodal request archetypes

Summary

Why

Configuration shape

Architecture

Three planes

Dispatch by result_kind

Schema 1.2

CLI surface

Output

Validation

Test coverage

Tutorial

Notes / learnings along the way

What's deliberately out of scope

Uh oh!

coderabbitai Bot commented May 14, 2026

Rate limit exceeded

Uh oh!

github-actions Bot commented May 14, 2026

Try out this PR

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

dynamo-ops May 14, 2026

Choose a reason for hiding this comment

Uh oh!

dynamo-ops May 14, 2026

Choose a reason for hiding this comment

Uh oh!

dynamo-ops May 14, 2026

Choose a reason for hiding this comment

Uh oh!

dynamo-ops May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthewkotila commented May 14, 2026 •

edited

Loading

Dispatch by `result_kind`