gpu: nvidia: ip: respect acc_mode for sum post-op #2479

sgeor255 · 2025-01-22T10:46:14Z

Description

Currently the inner product error threshold in benchdnn is set to 0. In some cases for large shapes on nvidia backend there are some precision issues (e.g. the cases reported in MFDNN-12610). This PR adjusts the error threshold so that such cases are not reported as failures.

Fixes MFDNN-12610.

tests/benchdnn/ip/ip.cpp

src/gpu/nvidia/README.md

sgeor255 · 2025-02-06T11:14:26Z

I think this PR still needs a review from @oneapi-src/onednn-doc

ranukund

Edits suggested, please incorporate as you see fit! Thanks!

src/gpu/nvidia/README.md

sgeor255 · 2025-02-06T15:39:32Z

Thanks @ranukund , I added the suggested changes.

ranukund

Thank you!

dzarukin

Thanks for the addressing the feedback. One relatively minor comment and it's good to go.

src/gpu/nvidia/cudnn_gemm_inner_product_impl.hpp

sgeor255 · 2025-02-24T14:29:17Z

It looks like the precommit check failures are in files unrelated to this PR.

sgeor255 requested a review from a team as a code owner January 22, 2025 10:46

github-actions bot added the component:tests Codeowner: @oneapi-src/onednn-arch label Jan 22, 2025

sgeor255 force-pushed the svet/nvdia-ip-precision branch from 0f0897a to 383b1a2 Compare January 22, 2025 11:59

mgouicem reviewed Jan 22, 2025

View reviewed changes

tests/benchdnn/ip/ip.cpp Show resolved Hide resolved

Rbiessy reviewed Jan 22, 2025

View reviewed changes

tests/benchdnn/ip/ip.cpp Outdated Show resolved Hide resolved

AD2605 approved these changes Jan 22, 2025

View reviewed changes

sgeor255 force-pushed the svet/nvdia-ip-precision branch from 383b1a2 to 441a27d Compare January 31, 2025 16:37

sgeor255 requested review from a team as code owners January 31, 2025 16:37

github-actions bot added documentation A request to change/fix/improve the documentation. Codeowner: @oneapi-src/onednn-doc platform:gpu-nvidia Codeowner: @oneapi-src/onednn-gpu-nvidia labels Jan 31, 2025

sgeor255 force-pushed the svet/nvdia-ip-precision branch from 441a27d to 7841ac5 Compare February 4, 2025 15:04

Rbiessy approved these changes Feb 4, 2025

View reviewed changes

src/gpu/nvidia/README.md Outdated Show resolved Hide resolved

src/gpu/nvidia/README.md Outdated Show resolved Hide resolved

ShanoToni approved these changes Feb 5, 2025

View reviewed changes

sgeor255 force-pushed the svet/nvdia-ip-precision branch from 7841ac5 to d1255da Compare February 6, 2025 08:37

Rbiessy approved these changes Feb 6, 2025

View reviewed changes

ranukund reviewed Feb 6, 2025

View reviewed changes

sgeor255 force-pushed the svet/nvdia-ip-precision branch from d1255da to a8d99a7 Compare February 6, 2025 15:37

sgeor255 requested a review from ranukund February 6, 2025 15:39

ranukund approved these changes Feb 6, 2025

View reviewed changes

sgeor255 requested review from dzarukin and mgouicem February 19, 2025 10:36

sgeor255 changed the title ~~gpu: nvidia: ip: adjust benchdnn error threshold~~ gpu: nvidia: ip: respect fp_math_mode for sum post-op Feb 19, 2025

sgeor255 changed the title ~~gpu: nvidia: ip: respect fp_math_mode for sum post-op~~ gpu: nvidia: ip: respect acc_mode for sum post-op Feb 19, 2025

dzarukin reviewed Feb 19, 2025

View reviewed changes

src/gpu/nvidia/cudnn_gemm_inner_product_impl.hpp Outdated Show resolved Hide resolved

gpu: nvidia: ip: adjust benchdnn error threshold

90190e7

sgeor255 force-pushed the svet/nvdia-ip-precision branch from a8d99a7 to 90190e7 Compare February 24, 2025 12:52

sgeor255 requested a review from dzarukin February 24, 2025 12:53

dzarukin approved these changes Feb 24, 2025

View reviewed changes

sgeor255 merged commit bbf8399 into uxlfoundation:main Feb 25, 2025
21 of 22 checks passed

manaalmj pushed a commit to manaalmj/oneDNN that referenced this pull request Mar 4, 2025

gpu: nvidia: ip: respect acc_mode for sum post-op (uxlfoundation#2479)

348a8bb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu: nvidia: ip: respect acc_mode for sum post-op #2479

gpu: nvidia: ip: respect acc_mode for sum post-op #2479

sgeor255 commented Jan 22, 2025

sgeor255 commented Feb 6, 2025

ranukund left a comment

sgeor255 commented Feb 6, 2025

ranukund left a comment

dzarukin left a comment

sgeor255 commented Feb 24, 2025

gpu: nvidia: ip: respect acc_mode for sum post-op #2479

gpu: nvidia: ip: respect acc_mode for sum post-op #2479

Conversation

sgeor255 commented Jan 22, 2025

Description

sgeor255 commented Feb 6, 2025

ranukund left a comment

Choose a reason for hiding this comment

sgeor255 commented Feb 6, 2025

ranukund left a comment

Choose a reason for hiding this comment

dzarukin left a comment

Choose a reason for hiding this comment

sgeor255 commented Feb 24, 2025