feat: use torch.cdist to compute Gaussian kernel for performance improvement #1358
+1
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here, the$K(x, y) = \exp(-\gamma | x - y |^2)$ . By replacing
gaussian_kernel
calculatesmy_cdist
withtorch.cdist
, we can significantly improve performance without affecting the original logic.In my environment tests (RTX 2080 Super, with
AdvancedProfiler
inpl.trainer
), the total execution time within thegaussian_kernel
function was reduced by half.Here is a partial comparison from
perf.log
:new:
old:
As for the calculation error, I tested with the following code:
In terms of numerical stability, there were only 2 assertion errors across over 87k calculations, and the calculation error remains < 2e-7, which should be acceptable for FP32 calculations.
Looking forward to any suggestions for further performance improvements!