Skip to content

Commit 69a8ee1

Browse files
committed
doc: clarify conversions can be impacted by double-rounding
1 parent dc830ad commit 69a8ee1

File tree

1 file changed

+10
-2
lines changed

1 file changed

+10
-2
lines changed

doc/programming_model/data_types.md

+10-2
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ post-ops). The following formula governs the datatypes dynamic during
7878
a primitive computation:
7979

8080
\f[
81-
\operatorname{convert_{dst\_dt}} ( \operatorname{dst\_zero\_point_{f32}} + \operatorname{postops_{f32}} (\operatorname{oscale_{f32}} * \operatorname{convert_{f32}} (\operatorname{Op}(\operatorname{src_{src\_dt}}, \operatorname{weights_{wei\_dt}}, ...))))
81+
\operatorname{convert_{dst\_dt}} ( \operatorname{zp_{dst}} + 1/\operatorname{scale_{dst}} * \operatorname{postops_{f32}} (\operatorname{convert_{f32}} (\operatorname{Op}(\operatorname{src_{src\_dt}}, \operatorname{weights_{wei\_dt}}, ...))))
8282
\f]
8383

8484
The `Op` output datatype depends on the datatype of its inputs:
@@ -99,7 +99,15 @@ No downconversions are allowed by default, but can be enabled using
9999
the floating-point math controls described in @ref
100100
dev_guide_attributes_fpmath_mode.
101101

102-
102+
The \f$convert_{dst\_dt}\f$ conversion is guaranteed to be faithfully
103+
rounded but not guaranteed to be correctly rounded (the returned value
104+
is not always the closest one but one of the two closest representable
105+
value). In particular, some hardware platforms have no direct
106+
conversion instructions from f32 data type to low-precision data types
107+
such as fp8 or fp4, and will perform conversion through an
108+
intermediate data type (for example f16 or bf16), which may result in
109+
[double
110+
rounding](https://en.wikipedia.org/wiki/Rounding#Double_rounding).
103111

104112
### Rounding mode and denormal handling
105113

0 commit comments

Comments
 (0)