Skip to content

Conversation

@yucai-intel
Copy link
Contributor

To resolve the issue where FP16's -0.0 is erroneously converted to NaN on XPU, this solution introduces a forced correction logic within all Half to Float8 data conversion Functors.
This logic is based on hardware bit pattern recognition, identifying the FP16 negative zero signature 0x8000, and safely converts it to the correct negative zero value, ensuring proper input for the downstream Float8 constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants