Skip to content

Fix excess rounding for f8e4m3 subnormal values #30057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xuchen-intel
Copy link
Contributor

Details:

  • Fix excess rounding for f8e4m3 subnormal values for Implement 2-step conversion from fp32 to fp8 #28501. In function f16_to_f8e4m3_bits, rounding should be done only once for normal values as well as for subnormal values seperately.
  • Add a unit test to reproduce this issue beforehand.

Tickets:

@xuchen-intel xuchen-intel added this to the 2025.2 milestone Apr 10, 2025
@xuchen-intel xuchen-intel requested a review from a team as a code owner April 10, 2025 06:44
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Apr 10, 2025
@xuchen-intel xuchen-intel changed the title [Draft] Fix excess rounding for f8e4m3 subnormal values Fix excess rounding for f8e4m3 subnormal values Apr 11, 2025
@xuchen-intel
Copy link
Contributor Author

@praasz Thanks Pawel for the review!
@maxnick Hi Maksim, please take a look. Thanks!

@wenjiew wenjiew requested a review from maxnick April 17, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Core OpenVINO Core (aka ngraph)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants