-
Prerequisites: First install hipDNN following the Building documentation.
-
Build Samples: From this
samplesdirectory:mkdir build && cd build cmake -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ .. ninja
The sample executables will be created in the build directory.
All samples are templated for mixed-precision execution with Fp32, Fp16, and Bfp16 input/output data types, with Fp32 intermediate accumulation.
Tip
💡 Set HIPDNN_LOG_LEVEL=info to observe detailed logs from the samples.
The current samples include:
Executes a single-node batch normalization inference graph on a 4D input tensor.
- It normalizes each dimension of the input tensor
xof shape(N, C, H, W), using pre-calculated population statistics. The result is then transformed by the learned parametersscaleandbias, each with shape(1, C, 1, 1). At a high-level, the following element-wise linear transformation is broadcast over the batch and spatial dimensions (N, H, W):wherey = scale * (x - mean) * inv_variance + bias
ywould then be propagated as input to the subsequent layer.
Executes the forward pass of a batch normalization training graph on a 4D input tensor.
- For an input
xof shape(N, C, H, W), the mean and variance are calculated over theN,H, andWdimensions for each of theCchannels or mini-batches, resulting in ameanandinv_varianceof shape(1, C, 1, 1). It then transforms the input and updates the running statistics:y = scale * (x - mean) * inv_variance + bias next_running_mean = (1 - momentum) * prev_running_mean + momentum * batch_mean next_running_variance = (1 - momentum) * prev_running_variance + momentum * batch_variance
- The graph outputs the normalized tensor
y, along with the mini-batch statistics (mean,inv_variance) required for the backward pass, and the updated population statistics (next_running_mean,next_running_variance) required for inference.
Executes the backward pass of a batch normalization graph to compute gradients of the loss function.
-
Given the upstream differentiable gradient
dyof shape(N, C, H, W), the downstream learnable gradients are computed with the chain-rule over the batch and spatial dimensions (N, H, W) with saved mini-batch statistics:dbias = sum(dy) x_hat = (x - mean) * inv_variance dscale = sum(dy * x_hat) d_x = scale * inv_variance * (dy - (dbias / nhw) - (x_hat * dscale / nhw))
where
nhw = N * H * W. -
For training,
d_xwould subsequently be passed to the preceding layer, andd_scaleandd_biascan be used by an optimizer to update the learnable parametersscaleandbias.