test(mem/dram): merge differential validation into one scenario set#391
test(mem/dram): merge differential validation into one scenario set#391syifan wants to merge 1 commit into
Conversation
Collapse the two trace-driven differential tiers (counts vs latency) into a single differential suite over one scenario set. Each scenario is now run once per oracle and once through Akita and compared on BOTH command counts and read latency, instead of having separate count-only and latency-only scenario lists. - run_oracles.py: run BOTH oracles for every scenario (DRAMSim3 gives counts + latency; Ramulator2 gives counts via tail-subtraction, now honoring each scenario's page policy). reference.csv carries both oracles for all 8 scenarios. Scenario schema simplified to a per-scenario read_latency flag (enforced/known_gap/off) + gap_reason. - validation_tier5_test.go: one "Differential validation" suite. Command counts are asserted exact wherever the TWO oracles AGREE on a count (reads/writes always; activates where map-independent — they agree on all current scenarios); read latency is compared vs DRAMSim3 (15% tol, known-gaps characterized). This removes the activates_exact guesswork in favor of a data-driven oracle-consensus rule. - README: coverage table reflects the merged suite; adds the read-coalescing feature gap to Findings. Tiers 1-4 (the in-process unit tests) are unchanged. Full mem/dram suite, vet, golangci-lint clean; both oracles agree on all counts and Akita matches.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27e1e04f36
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for k in ("reads", "writes"): | ||
| if got[k] != exp[k]: | ||
| print(f"WARN {sim}/{scn['name']}: {k} {got[k]} != expected " | ||
| f"{exp[k]}", file=sys.stderr) |
There was a problem hiding this comment.
Fail on read/write count mismatches
When regenerating references, if either oracle reports a read/write count different from the trace count (for example because Ramulator2 tail-subtraction fails on an open-page scenario), this only prints a warning and still writes reference.csv. The Go test added here asserts counts only when the two oracle values agree, so a mismatched read/write row silently disables that metric even though reads/writes are supposed to be trace-determined; return non-zero before writing the CSV instead.
Useful? React with 👍 / 👎.
Summary
Collapses the two trace-driven differential tiers (command counts vs. read latency) into one differential suite over a single scenario set. Each scenario is now run once per oracle and once through Akita and compared on both metrics — instead of having a count-only scenario list and a separate latency-only list.
The in-process unit tests (formerly "Tiers 1–4") are unchanged.
What changed
run_oracles.py— runs both oracles for every scenario:reference.csvcarries both oracles for all 8 scenarios.read_latencyflag (enforced/known_gap/off) +gap_reason.validation_tier5_test.go— oneDifferential validationsuite:activates_exactguess with a data-driven oracle-consensus rule.known_gapscenarios characterized (currently 54–63%, the address-mapping gap).README — coverage table reflects the merged suite; adds the read-coalescing feature gap to Findings (DRAMSim3 coalesces pending same-address reads; Akita doesn't — so saturated latency is only fair on distinct-address traces until modeled).
Testing
mem/dramsuite,go vet,golangci-lint(0 issues) clean.Follow-ups (separate)
Generated by Claude Code