Float8 Conversion: Forced Correction for -0.0 #2387

yucai-intel · 2025-11-21T08:33:41Z

To resolve the issue where FP16's -0.0 is erroneously converted to NaN on XPU, this solution introduces a forced correction logic within all Half to Float8 data conversion Functors.
This logic is based on hardware bit pattern recognition, identifying the FP16 negative zero signature 0x8000, and safely converts it to the correct negative zero value, ensuring proper input for the downstream Float8 constructor.

yucai-intel added 2 commits November 21, 2025 16:28

Update CopyKernel.cpp

ac60ff0

Update CopyKernel.cpp

3bb86ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Float8 Conversion: Forced Correction for -0.0 #2387

Float8 Conversion: Forced Correction for -0.0 #2387

Uh oh!

yucai-intel commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Float8 Conversion: Forced Correction for -0.0 #2387

Are you sure you want to change the base?

Float8 Conversion: Forced Correction for -0.0 #2387

Uh oh!

Conversation

yucai-intel commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants