[HIPDNN] Tolerance lookup followup#8752
Draft
bghimireamd wants to merge 2 commits into
Draft
Conversation
Harness unification (ALMIOPEN-1969 follow-up): - Share input synthesis between both harnesses: move SynthesisTracker and SynthesizeInputs to harness/input_init/ (namespace hipdnn_integration_tests). The non-golden harness now drives its initializeBundle() through the same synthesis switch, with a random [-1,1] fallback. Remove 4 of 7 non-golden initializeBundle overrides whose ranges/seeds now match the shared fill functions; 3 remain (fused-graph range conflicts and a large-values stress test). - Rename harness/golden/ -> harness/bundle/ and namespace hipdnn_integration_tests::golden -> ::bundle. "golden" was inaccurate: the harness verifies bundles via golden data OR a GPU/CPU reference, and most bundles ship no golden output. Rename the class IntegrationGraphGoldenReferenceVerificationHarness -> IntegrationBundleVerificationHarness (parallel to its sibling IntegrationGraphVerificationHarness). Fix stale "golden" log/report strings that mislabeled all bundles or shared comparison paths. The golden verification *mode* and golden output *data* vocabulary, and the --golden-data-dir CLI flag, are retained (accurate / external). - Drop redundant hipInit/hipGetDevice/_deviceId from the graph harness: HIP initializes lazily and getSharedHandle()->hipdnnCreate() already does it before any graph runs; _deviceId was write-only. - Split bundle metadata guards: the VRAM check applies to every bundle the engine runs, but the GPU-arch lock only matters when comparing against golden output values (arch-specific). Gate the arch check on hasGoldenOutputs so an inputs-only bundle verified against a local reference is not wrongly skipped on a different arch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both harnesses resolved comparison tolerance with duplicated, divergent
code. Replace both with one shared resolver
(harness/tolerance/ToleranceResolver.hpp) where the aggregation policy is a
plain function (GraphWrapper, dtype) -> float, selectable per caller.
What changed:
- New resolveTolerance(wrapper, dtype, testName, atol, rtol, policy=max):
derives a default via the chosen aggregation policy, then applies the TOML
per-test override (highest priority) in ONE place. Single tolerance entry
point for both harnesses; the override is no longer applied separately in
either, so the layering order lives here alone.
- Two policies, each a free function (no enum/switch; the policy IS the
function, C++17 function pointer):
* maxAcrossNodes — loosest per-node tolerance; conservative envelope,
never tighter than any node so it cannot cause a false failure.
* outputOpTolerance — tolerance of the last non-Pointwise op (the
output-producing op); reproduces the graph harness's historical
getTolerance() behavior.
- Wiring preserves existing behavior on both sides:
* Graph harness getTolerance() -> outputOpTolerance (its prior policy),
now expressed through the shared resolver instead of a private
dynamic_cast switch. It also serializes via to_binary() and reads the
output dtype from the flatbuffer, so it shares the resolver keyed on the
same representation the bundle harness uses.
* Bundle harness -> maxAcrossNodes (its prior policy), unchanged.
The two policies agree for the common one-real-op + activation case
(activation is Pointwise -> skipped; the single real op is both loosest and
last), so they differ only on multi-real-op fusions, which carry explicit
overrides today.
- Deletes the duplicate per-op tolerance switch that existed in each harness
(one on dynamic_cast of live nodes, one on the flatbuffer NodeAttributes
enum); both now share one flatbuffer-keyed lookup.
- warnIfMultipleOutputs() guards the single-output assumption: every current
policy reduces over the whole graph, not the per-output subgraph, so
multi-output graphs are flagged loudly (deferred fix, ALMIOPEN-2216).
Why output-op for the graph harness (not max): a mechanical migration of the
C++ graph tests should not change their tolerances. outputOpTolerance keeps the
exact prior numbers. Note this is a heuristic, not a principled tight bound
(it attributes the whole output tolerance to one op); it is kept for migration
parity. The principled tighten path is the future DynamicTolerances upgrade
(see TODO in ToleranceResolver.hpp and ALMIOPEN-2216); FP4 also needs sub-bf16
entries in the dtype switch (currently falls through to 1e-3).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist