Skip to content

test(mem/dram): merge differential validation into one scenario set#391

Open
syifan wants to merge 1 commit into
mainfrom
claude/dram-unified-differential
Open

test(mem/dram): merge differential validation into one scenario set#391
syifan wants to merge 1 commit into
mainfrom
claude/dram-unified-differential

Conversation

@syifan

@syifan syifan commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Collapses the two trace-driven differential tiers (command counts vs. read latency) into one differential suite over a single scenario set. Each scenario is now run once per oracle and once through Akita and compared on both metrics — instead of having a count-only scenario list and a separate latency-only list.

The in-process unit tests (formerly "Tiers 1–4") are unchanged.

What changed

run_oracles.py — runs both oracles for every scenario:

  • DRAMSim3 → command counts + read latency.
  • Ramulator2 → command counts via tail-subtraction (now honoring each scenario's page policy, not hard-coded close-page).
  • reference.csv carries both oracles for all 8 scenarios.
  • Scenario schema simplified to a per-scenario read_latency flag (enforced / known_gap / off) + gap_reason.

validation_tier5_test.go — one Differential validation suite:

  • Command counts asserted exact wherever the two oracles agree on a count — reads/writes always agree (one column command per access); activates agree wherever the count is map-independent (they agree on every current scenario). Where the oracles disagree, it's a documented reference divergence, not asserted. This replaces the earlier per-scenario activates_exact guess with a data-driven oracle-consensus rule.
  • Read latency compared vs DRAMSim3 (15% tolerance); known_gap scenarios characterized (currently 54–63%, the address-mapping gap).

README — coverage table reflects the merged suite; adds the read-coalescing feature gap to Findings (DRAMSim3 coalesces pending same-address reads; Akita doesn't — so saturated latency is only fair on distinct-address traces until modeled).

Testing

  • Full mem/dram suite, go vet, golangci-lint (0 issues) clean.
  • Both oracles agree on all counts for all 8 scenarios; Akita matches; enforced latency within tolerance; the two known gaps stay characterized.

Follow-ups (separate)


Generated by Claude Code

Collapse the two trace-driven differential tiers (counts vs latency) into a
single differential suite over one scenario set. Each scenario is now run once
per oracle and once through Akita and compared on BOTH command counts and read
latency, instead of having separate count-only and latency-only scenario lists.

- run_oracles.py: run BOTH oracles for every scenario (DRAMSim3 gives counts +
  latency; Ramulator2 gives counts via tail-subtraction, now honoring each
  scenario's page policy). reference.csv carries both oracles for all 8
  scenarios. Scenario schema simplified to a per-scenario read_latency flag
  (enforced/known_gap/off) + gap_reason.
- validation_tier5_test.go: one "Differential validation" suite. Command counts
  are asserted exact wherever the TWO oracles AGREE on a count (reads/writes
  always; activates where map-independent — they agree on all current
  scenarios); read latency is compared vs DRAMSim3 (15% tol, known-gaps
  characterized). This removes the activates_exact guesswork in favor of a
  data-driven oracle-consensus rule.
- README: coverage table reflects the merged suite; adds the read-coalescing
  feature gap to Findings.

Tiers 1-4 (the in-process unit tests) are unchanged. Full mem/dram suite, vet,
golangci-lint clean; both oracles agree on all counts and Akita matches.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 27e1e04f36

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +354 to +357
for k in ("reads", "writes"):
if got[k] != exp[k]:
print(f"WARN {sim}/{scn['name']}: {k} {got[k]} != expected "
f"{exp[k]}", file=sys.stderr)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail on read/write count mismatches

When regenerating references, if either oracle reports a read/write count different from the trace count (for example because Ramulator2 tail-subtraction fails on an open-page scenario), this only prints a warning and still writes reference.csv. The Go test added here asserts counts only when the two oracle values agree, so a mismatched read/write row silently disables that metric even though reads/writes are supposed to be trace-determined; return non-zero before writing the CSV instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants