Skip to content

[HIPDNN] Tolerance lookup followup#8752

Draft
bghimireamd wants to merge 2 commits into
users/bghimire/ALMIOPEN-1969/tolerance_lookupfrom
users/bghimire/ALMIOPEN-1969/tolerance_lookup_followup
Draft

[HIPDNN] Tolerance lookup followup#8752
bghimireamd wants to merge 2 commits into
users/bghimire/ALMIOPEN-1969/tolerance_lookupfrom
users/bghimire/ALMIOPEN-1969/tolerance_lookup_followup

Conversation

@bghimireamd

Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

bghimireamd and others added 2 commits June 23, 2026 22:04
Harness unification (ALMIOPEN-1969 follow-up):

- Share input synthesis between both harnesses: move SynthesisTracker
  and SynthesizeInputs to harness/input_init/ (namespace
  hipdnn_integration_tests). The non-golden harness now drives its
  initializeBundle() through the same synthesis switch, with a random
  [-1,1] fallback. Remove 4 of 7 non-golden initializeBundle overrides
  whose ranges/seeds now match the shared fill functions; 3 remain
  (fused-graph range conflicts and a large-values stress test).

- Rename harness/golden/ -> harness/bundle/ and namespace
  hipdnn_integration_tests::golden -> ::bundle. "golden" was inaccurate:
  the harness verifies bundles via golden data OR a GPU/CPU reference,
  and most bundles ship no golden output. Rename the class
  IntegrationGraphGoldenReferenceVerificationHarness ->
  IntegrationBundleVerificationHarness (parallel to its sibling
  IntegrationGraphVerificationHarness). Fix stale "golden" log/report
  strings that mislabeled all bundles or shared comparison paths. The
  golden verification *mode* and golden output *data* vocabulary, and
  the --golden-data-dir CLI flag, are retained (accurate / external).

- Drop redundant hipInit/hipGetDevice/_deviceId from the graph harness:
  HIP initializes lazily and getSharedHandle()->hipdnnCreate() already
  does it before any graph runs; _deviceId was write-only.

- Split bundle metadata guards: the VRAM check applies to every bundle
  the engine runs, but the GPU-arch lock only matters when comparing
  against golden output values (arch-specific). Gate the arch check on
  hasGoldenOutputs so an inputs-only bundle verified against a local
  reference is not wrongly skipped on a different arch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both harnesses resolved comparison tolerance with duplicated, divergent
code. Replace both with one shared resolver
(harness/tolerance/ToleranceResolver.hpp) where the aggregation policy is a
plain function (GraphWrapper, dtype) -> float, selectable per caller.

What changed:
- New resolveTolerance(wrapper, dtype, testName, atol, rtol, policy=max):
  derives a default via the chosen aggregation policy, then applies the TOML
  per-test override (highest priority) in ONE place. Single tolerance entry
  point for both harnesses; the override is no longer applied separately in
  either, so the layering order lives here alone.
- Two policies, each a free function (no enum/switch; the policy IS the
  function, C++17 function pointer):
    * maxAcrossNodes  — loosest per-node tolerance; conservative envelope,
      never tighter than any node so it cannot cause a false failure.
    * outputOpTolerance — tolerance of the last non-Pointwise op (the
      output-producing op); reproduces the graph harness's historical
      getTolerance() behavior.
- Wiring preserves existing behavior on both sides:
    * Graph harness getTolerance() -> outputOpTolerance (its prior policy),
      now expressed through the shared resolver instead of a private
      dynamic_cast switch. It also serializes via to_binary() and reads the
      output dtype from the flatbuffer, so it shares the resolver keyed on the
      same representation the bundle harness uses.
    * Bundle harness -> maxAcrossNodes (its prior policy), unchanged.
  The two policies agree for the common one-real-op + activation case
  (activation is Pointwise -> skipped; the single real op is both loosest and
  last), so they differ only on multi-real-op fusions, which carry explicit
  overrides today.
- Deletes the duplicate per-op tolerance switch that existed in each harness
  (one on dynamic_cast of live nodes, one on the flatbuffer NodeAttributes
  enum); both now share one flatbuffer-keyed lookup.
- warnIfMultipleOutputs() guards the single-output assumption: every current
  policy reduces over the whole graph, not the per-output subgraph, so
  multi-output graphs are flagged loudly (deferred fix, ALMIOPEN-2216).

Why output-op for the graph harness (not max): a mechanical migration of the
C++ graph tests should not change their tolerances. outputOpTolerance keeps the
exact prior numbers. Note this is a heuristic, not a principled tight bound
(it attributes the whole output tolerance to one op); it is kept for migration
parity. The principled tighten path is the future DynamicTolerances upgrade
(see TODO in ToleranceResolver.hpp and ALMIOPEN-2216); FP4 also needs sub-bf16
entries in the dtype switch (currently falls through to 1e-3).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bghimireamd bghimireamd changed the base branch from develop to users/bghimire/ALMIOPEN-1969/tolerance_lookup June 24, 2026 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant