[HipDNN] Layernorm bwd frontend and CPU reference by brentmaas · Pull Request #6566 · ROCm/rocm-libraries

brentmaas · 2026-04-20T09:56:47Z

Motivation

Generate the frontend code and implement the CPU reference for backward layernorm in HipDNN.

Technical Details

Generate the backward layernorm frontend code.
Fix issues in generated code.
Implement a backward layernorm CPU reference.
Add tests for frontend and CPU reference.

Test Plan

Build and run the check target.

Test Result

All new and existing tests pass.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

codecov-commenter · 2026-05-28T08:25:47Z

Codecov Report

❌ Patch coverage is 92.09402% with 111 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...s/cpu_graph_executor/detail/LayernormBpropPlan.hpp	83.33%	3 Missing and 22 partials ⚠️
...hipdnn_frontend/detail/LayernormBackwardPacker.hpp	75.00%	4 Missing and 15 partials ⚠️
...pdnn_frontend/detail/LayernormBackwardUnpacker.hpp	81.25%	3 Missing and 15 partials ⚠️
...ude/hipdnn_frontend/node/LayernormBackwardNode.hpp	91.63%	6 Missing and 12 partials ⚠️
...dnn_test_sdk/utilities/CpuFpReferenceLayernorm.hpp	91.72%	9 Missing and 5 partials ⚠️
...scriptors/LayernormBackwardOperationDescriptor.cpp	98.59%	0 Missing and 5 partials ⚠️
.../hipdnn/frontend/include/hipdnn_frontend/Graph.hpp	88.89%	3 Missing and 2 partials ⚠️
...n/data_sdk/include/hipdnn_data_sdk/types/Int32.hpp	0.00%	3 Missing ⚠️
...ects/hipdnn/backend/src/BackendEnumStringUtils.hpp	92.31%	1 Missing and 1 partial ⚠️
...aph_executor/detail/LayernormBpropSignatureKey.hpp	98.40%	0 Missing and 2 partials ⚠️

❌ Your project status has failed because the head coverage (76.92%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #6566      +/-   ##
===========================================
+ Coverage    71.33%   71.40%   +0.07%     
===========================================
  Files         2628     2636       +8     
  Lines       413045   414443    +1398     
  Branches     61875    62073     +198     
===========================================
+ Hits        294615   295905    +1290     
- Misses       96656    96686      +30     
- Partials     21774    21852      +78

Flag	Coverage Δ		*Carryforward flag
TensileLite	`76.65% <ø> (ø)`		Carriedforward from 0fd8b2c
hipBLAS	`90.81% <ø> (ø)`		Carriedforward from 0fd8b2c
hipBLASLt	`41.35% <ø> (ø)`		Carriedforward from 0fd8b2c
hipCUB	`82.68% <ø> (ø)`		Carriedforward from 0fd8b2c
hipDNN	`86.08% <92.09%> (+0.17%)`	⬆️
hipFFT	`50.17% <ø> (ø)`		Carriedforward from 0fd8b2c
hipRAND	`76.12% <ø> (ø)`		Carriedforward from 0fd8b2c
hipSOLVER	`69.18% <ø> (ø)`		Carriedforward from 0fd8b2c
hipSPARSE	`86.55% <ø> (ø)`		Carriedforward from 0fd8b2c
rocBLAS	`48.06% <ø> (ø)`		Carriedforward from 0fd8b2c
rocFFT	`46.30% <ø> (ø)`		Carriedforward from 0fd8b2c
rocRAND	`57.07% <ø> (ø)`		Carriedforward from 0fd8b2c
rocSOLVER	`76.92% <ø> (ø)`		Carriedforward from 0fd8b2c
rocSPARSE	`72.37% <ø> (ø)`		Carriedforward from 0fd8b2c
rocThrust	`91.36% <ø> (ø)`		Carriedforward from 0fd8b2c

*This pull request uses carry forward flags. Click here to find out more.

Files with missing lines	Coverage Δ
...pdnn/backend/src/descriptors/DescriptorFactory.cpp	`96.30% <100.00%> (+0.08%)`	⬆️
...scriptors/LayernormBackwardOperationDescriptor.hpp	`100.00% <100.00%> (ø)`
...cts/hipdnn/backend/src/descriptors/NodeFactory.cpp	`100.00% <100.00%> (ø)`
...rontend/attributes/LayernormBackwardAttributes.hpp	`100.00% <100.00%> (ø)`
...clude/hipdnn_frontend/detail/OperationUnpacker.hpp	`100.00% <100.00%> (ø)`
...s/cpu_graph_executor/CpuReferenceGraphExecutor.hpp	`62.61% <100.00%> (+0.66%)`	⬆️
...aph_executor/detail/LayernormFpropSignatureKey.hpp	`86.99% <100.00%> (-0.10%)`	⬇️
.../cpu_graph_executor/detail/PlanBuilderRegistry.hpp	`100.00% <ø> (ø)`
...graph_executor/detail/PlanRegistrySignatureKey.hpp	`100.00% <ø> (ø)`
...ects/hipdnn/backend/src/BackendEnumStringUtils.hpp	`98.69% <92.31%> (-0.16%)`	⬇️
... and 9 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sbalint98 · 2026-06-05T09:41:02Z

-               ^ (static_cast<std::size_t>(static_cast<int>(dxDataType)) << 16)
+               ^ (static_cast<std::size_t>(static_cast<int>(scaleBiasDataType)) << 8)
+               ^ (static_cast<std::size_t>(static_cast<int>(meanInvVarianceDataType)) << 12)
+               ^ (static_cast<std::size_t>(static_cast<int>(outputDataType)) << 16)
               ^ (static_cast<std::size_t>(static_cast<int>(computeDataType)) << 20);


Nit: I think here we can drop the cast to int.

I must've copied that from the forward pass. I've removed it in both the forward and backward pass now.

EwanC

Looking good, couple more minor comments

SamuelReeder · 2026-06-11T17:10:58Z

+        CHECK_TENSOR_TYPE(tensorMap, nodeAttributes->dscale_tensor_uid(), OutputDataTypeEnum);
+        CHECK_TENSOR_TYPE(tensorMap, nodeAttributes->dbias_tensor_uid(), OutputDataTypeEnum);


These type checks don't match the execution parameter types in projects/hipdnn/test_sdk/include/hipdnn_test_sdk/utilities/CpuFpReferenceLayernorm.hpp bprop.

I think they should be verified against ScaleBiasDataTypeEnum.

Minor mistake, should be fixed now.

SamuelReeder · 2026-06-11T17:13:59Z

+        utilities::CpuFpReferenceLayernorm::bprop(*shallowDyTensor,
+                                                  *shallowXTensor,
+                                                  *shallowScaleTensor,
+                                                  *shallowDxTensor,
+                                                  *shallowDscaleTensor,
+                                                  *shallowDbiasTensor,
+                                                  epsilon,
+                                                  shallowMeanTensor.get(),
+                                                  shallowInvVarianceTensor.get(),
+                                                  _params.normalizedDimCount);


ComputeDataType template param isn't passed explicitly so it will default to float. You will need to explicitly pass the template params.

I must've misjudged that ComputeDataType would be used by the arguments. Fixed now.

SamuelReeder · 2026-06-11T17:16:59Z

+        // Mean/inv_variance type: use mean if present, otherwise default to IO type (dy type)
+        if(nodeAttributes->mean_tensor_uid().has_value())
+        {
+            auto meanTensorAttr = tensorMap.at(nodeAttributes->mean_tensor_uid().value());
+            meanInvVarianceDataType = meanTensorAttr->data_type();
+        }
+        else
+        {
+            meanInvVarianceDataType = dyDataType;
+        }


Do we need to set meanInvVarianceDataType if the tensors are omitted?

It was necessary to set meanInvVarianceDataType to dyDataType when mean and inverse variance were omitted for signature key lookup. I've tried to set it to UNSET instead and add plan builders for the omitted case, but that led to a bunch of compilation issues everywhere with void being the type, so I think this is cleaner solution for now.

SamuelReeder · 2026-06-11T17:19:50Z

I think there are more tests to be added here to coincide with similar lowering integration files.

SamuelReeder · 2026-06-11T17:22:15Z

+// Standard LayernormBackward constants for testing get/set of valid operations.
+// These represent "any valid layernormbackward" — specific values are not significant.


Specific values are insignificant, but we at least try to have UID uniqueness across all these constant files to avoid mixing up constants. Please check to make sure these UIDs values are unique, and if not, find a unique range.

I've changed the UIDs to unique values.

SamuelReeder · 2026-06-11T17:26:30Z

+    /** @brief Output gradient tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_DY = 3600,
+
+    /** @brief Input tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_X = 3601,
+
+    /** @brief Scale tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_SCALE = 3602,
+
+    /** @brief Mean tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_MEAN = 3603,
+
+    /** @brief Inverse variance tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_INV_VARIANCE = 3604,
+
+    /** @brief Epsilon tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_EPSILON = 3605,
+
+    /** @brief Input gradient tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_DX = 3606,
+
+    /** @brief Scale gradient tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_DSCALE = 3607,
+
+    /** @brief Bias gradient tensor for backward layernorm */
+    HIPDNN_ATTR_OPERATION_LAYERNORM_BACKWARD_DBIAS = 3608,
+
+    /** @brief Number of normalized dimensions for backward layernorm */
+    HIPDNN_ATTR_LAYERNORM_BACKWARD_NORMALIZED_DIM_COUNT = 3609,


These need to be _EXT per lack of descriptor API parity with cuDNN.

See:

https://github.com/ROCm/rocm-libraries/blob/develop/projects/hipdnn/docs/AddingNewOperations.md#review-implementation

https://github.com/NVIDIA/cudnn-frontend/blob/develop/include/cudnn_frontend/node/dln.h#L101-L203 (their descriptor naming)

I've fixed it now (including the strings). I don't think some of these documents existed when I did this.

SamuelReeder · 2026-06-11T17:47:42Z

+        // Infer output shape and strides if not set
+        if(attributes.get_dx()->get_dim().empty())
+        {
+            attributes.get_dx()->set_dim(attributes.get_x()->get_dim());
+        }
+        if(attributes.get_dx()->get_stride().empty())
+        {
+            attributes.get_dx()->set_stride(attributes.get_x()->get_stride());
+        }


We should probably infer dy too.

Missed that, probably because I thought that the input vectors would be known. The minimum set of known input shapes should now be x and scale.

SamuelReeder · 2026-06-11T17:52:37Z

+        return _wrapper->asDescriptor<LayernormBackwardOperationDescriptor>();
+    }
+
+    void setTensors() const


Should set normalized_dim_count here too.

Done, including the relevant tests.

SamuelReeder · 2026-06-11T17:54:40Z

+namespace hipdnn_backend
+{
+
+void LayernormBackwardOperationDescriptor::finalize()


I think this should also check that mean and inv_variance are only dually present.

Added to finalize and its tests.

SamuelReeder · 2026-06-11T17:58:37Z

Missing the addition to this function.

I can't see which function you mean, but I assume it was hipdnnGetOperationTypeString. I've fixed a missing case there.

SamuelReeder · 2026-06-11T18:23:47Z

+            hipdnn_data_sdk::utilities::iterateAlongDimensions(
+                normalizedDims, [&](const std::vector<int64_t>& normIndices) {
+                    auto fullIndices
+                        = buildFullIndices(batchIndices, normIndices, ndim, normalizedDimCount);


buildFullIndices is broken for mixed convention tensors, where only some are 1-padded.

Suppose:

scale.shape = [C, H, W] // reduced-rank mean.shape = [N, 1, 1, 1] // one-padded

And we have:

batchIndices.size() == 4 normIndices.size() == 3

The helper will assume both are reduced-rank, and will create:

fullIndices = [n, 0, 0, 0, c, h, w]

We could pass reduced rank indices here always, but I think it's better to modify the helper to compute batchRank = ndim - normalizedDimCount; and take batchRank indices from batchIndices, and take the last normalizedDimCount entries from normIndices.

We should also make sure these cases are tested.

I've rewritten buildFullIndices to just take normalizedDimCount or ndim - normalizedDimCount dimensions and added some extra tests to cover odd cases.

SamuelReeder · 2026-06-11T18:28:45Z

+        const auto& dims = dy.dims();
+        auto ndim = static_cast<int64_t>(dims.size());
+
+        if(ndim < 1)


For optional improved handling, consider also checking:

dy.dims() == x.dims()

dx.dims() == x.dims()

scale shape is compatible with normalizedDimCount

mean/rstd shape is compatible with batch dims

one-padded and reduced-rank combinations are valid

Some of these may also be appropriate for pre_validate_node.

I've added a bunch of checks to pre_validate_node and added tests for them.

SamuelReeder · 2026-06-11T18:34:26Z

+// Builds a standard LayernormBackward graph, lowers via build_operation_graph(handle),
+// lifts back with fromBackendDescriptor(), and performs comprehensive field-by-field
+// validation of graph data types, tensor attributes, and operation parameters.
+TEST_F(IntegrationLayernormBackwardDescriptorLifting, BasicLayernormBackwardRoundTrip)


Verify optional tensors too.

SamuelReeder · 2026-06-11T18:35:36Z

+// After lifting, verifies tensor objects in the node attributes are the same
+// shared_ptr instances as in the tensor map (pointer equality).
+TEST_F(IntegrationLayernormBackwardDescriptorLifting, LayernormBackwardTensorSharingPreserved)


Probably worth verifying optional tensors again. I think LayernormBackwardLiftWithoutFinalization is the same too.

Done for all relevant tests in this file.

SamuelReeder · 2026-06-11T18:37:14Z

I think we're missing any verification that normalized_dim_count is preserved through lifting.

I think you're right, so I've added checks to this file where relevant.

therock-pr-bot · 2026-07-01T14:00:23Z

❌ PR Check — Action Required

Check	Status	Details
🌿 Branch Name	✅ Pass	—
📝 PR Title/Description	❌ Fail	Error: Title does not follow Conventional Commits style. Expected: start with a valid type (feat, fix, docs, …). Desired format: `type(optional-scope): short description` ─── Error: PR description must reference a JIRA ID, ISSUE ID, or a GitHub closing keyword. Expected: include a `JIRA ID` / `ISSUE ID` line (separator `:` or `-`, or omitted; value may be a JIRA key, a number with/without `#`, or a link), OR a closing keyword + issue reference. Accepted examples: • `JIRA ID : TESTAUTO-6039` • `JIRA ID - #330` • `JIRA ID #330` • `ISSUE ID : TESTUTO-3334` • `ISSUE ID #3334` • `ISSUE ID - TESTAUTO-3433` • `ISSUE ID : https://github.com/<org_name>/<repo_name>/issues/1234` • `Closes #10` • `Fixes octo-org/octo-repo#100` • `Resolves: #123` • `#123` • `https://github.com/<org_name>/<repo_name>/issues/123` Current: no valid JIRA/ISSUE/closing-keyword reference found
⛔ Forbidden Files	✅ Pass	—
🧪 Unit Test	❌ Fail	Error: Source/code files changed without an accompanying unit test. Expected: add at least one test file named like `test_<name>.py` / `test_<name>.cpp` (or `<name>_test.`). Current:* code file(s) changed: `projects/hipdnn/backend/include/HipdnnBackendAttributeName.h`, `projects/hipdnn/backend/include/HipdnnBackendDescriptorType.h`, `projects/hipdnn/backend/include/HipdnnOperationType.h`, `projects/hipdnn/backend/src/BackendEnumStringUtils.hpp`, `projects/hipdnn/backend/src/descriptors/DescriptorFactory.cpp` (+38 more); no test file found
🔎 pre-commit	❌ Fail	Error: Check concluded with `failure`.
🚫 Draft PR	🔜 To Be Enabled	—
🚩 Feature Flag	🔜 To Be Enabled	—
📊 Code Coverage	🔜 To Be Enabled	—

⚠️ 3 policy check(s) failed. Please address the issues above before this PR can be Reviewed.

🚫 Please fix the failed policies

❌ PR Title/Description

❌ Unit Test

❌ pre-commit

The Not ready to Review label was added to this PR. Once all policies pass, the label is removed automatically.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

therock-pr-bot · 2026-07-01T14:00:25Z

🚫 Please fix the failed policies before requesting reviews.

The following policy checks failed:

❌ PR Title/Description
❌ Unit Test

The Not ready to Review label has been added to this PR.
Once all policies pass, the label will be removed automatically.

…some extra tests

brentmaas added organization: streamhpc contributors from streamhpc project: hipdnn labels Apr 20, 2026

brentmaas force-pushed the users/brentmaas/layernorm-bwd-schema branch from 8452381 to 2497b3d Compare April 22, 2026 12:50

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch 2 times, most recently from 3df0f50 to 9af89c1 Compare April 29, 2026 15:42

brentmaas force-pushed the users/brentmaas/layernorm-bwd-schema branch from ca6e262 to 97be077 Compare April 30, 2026 07:49

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch from 9af89c1 to af9142d Compare April 30, 2026 07:49

EwanC force-pushed the users/brentmaas/layernorm-bwd-schema branch 2 times, most recently from df3c852 to 963e9de Compare May 22, 2026 07:23

Base automatically changed from users/brentmaas/layernorm-bwd-schema to develop May 22, 2026 11:48

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch 4 times, most recently from 5892360 to 6a69aa7 Compare May 28, 2026 07:42

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch 2 times, most recently from 4a09af0 to c253cda Compare June 1, 2026 15:48

brentmaas marked this pull request as ready for review June 2, 2026 11:55

brentmaas requested a review from a team as a code owner June 2, 2026 11:55

brentmaas requested review from EwanC and sbalint98 June 2, 2026 12:06

EwanC reviewed Jun 2, 2026

View reviewed changes

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch from 2c0b8c5 to 328b560 Compare June 5, 2026 10:09

sbalint98 approved these changes Jun 5, 2026

View reviewed changes

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch 2 times, most recently from bdb7495 to 8b51606 Compare June 5, 2026 12:20

EwanC reviewed Jun 5, 2026

View reviewed changes

Comment thread projects/hipdnn/backend/src/BackendEnumStringUtils.hpp

Comment thread projects/hipdnn/backend/tests/TestBackendEnumStringUtils.cpp Outdated

Comment thread projects/hipdnn/frontend/include/hipdnn_frontend/attributes/LayernormBackwardAttributes.hpp

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch from e85af0b to 519d1d2 Compare June 5, 2026 16:42

EwanC approved these changes Jun 11, 2026

View reviewed changes

SamuelReeder reviewed Jun 11, 2026

View reviewed changes

brentmaas added 6 commits July 1, 2026 13:09

Generated frontend code after fixing code generation mistakes

2e734a8

Layernorm backward CPU reference

3b7a5a8

Support for absent mean/rstd for layernorm bwd reference

de8418f

Additional tests and remove some unnecessary checks

19a66e9

Fix layernorm backward descriptor name

ddb96b7

Various fixes, checks and changes to address review comments

53fb612

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch from 519d1d2 to 53fb612 Compare July 1, 2026 13:55

therock-pr-bot Bot added the Not ready to Review label Jul 1, 2026

A simpler yet more general reimplementation of buildFullIndices with …

832ec00

…some extra tests

brentmaas force-pushed the users/brentmaas/layernorm-bwd-frontend branch from a24b304 to 832ec00 Compare July 1, 2026 17:17

		CHECK_TENSOR_TYPE(tensorMap, nodeAttributes->dscale_tensor_uid(), OutputDataTypeEnum);
		CHECK_TENSOR_TYPE(tensorMap, nodeAttributes->dbias_tensor_uid(), OutputDataTypeEnum);

		// Standard LayernormBackward constants for testing get/set of valid operations.
		// These represent "any valid layernormbackward" — specific values are not significant.

Uh oh!

Conversation

brentmaas commented Apr 20, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

codecov-commenter commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EwanC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamuelReeder Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamuelReeder Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

codecov-commenter commented May 28, 2026 •

edited

Loading

SamuelReeder Jun 11, 2026 •

edited

Loading

SamuelReeder Jun 11, 2026 •

edited

Loading

therock-pr-bot Bot commented Jul 1, 2026 •

edited

Loading