ULP Difference of Float Numbers
The Unit in the Last Place (ULP) is the smallest difference between two adjacent floating-point numbers at a specific precision. It measures the granularity of floating-point representations and is essential for understanding rounding errors and numerical stability. For example, a ULP near 1.0 is typically smaller than a ULP near 10.0, as the step size depends on the exponent of the number.
ULP differences are often used to evaluate the accuracy of floating-point operations. When comparing two floating-point numbers, a difference of 1 ULP indicates that the numbers are as close as they can be while still being distinct. Smaller ULP differences imply better precision and numerical accuracy. When implementing floating-point operations or algorithms (e.g., the tanhf function), the smallest possible ULP difference is 0.5 indicating that the exact result is correctly rounded to the nearest floating-point number.
Calculating ULP
To calculate the ULP of a floating-point number, you can use the following C function:
#include <math.h>
float flt_ulp_size(float x) {
x = fabs(x);
return nextafterf(x, INFINITY) - x;
}
This function calculates the ULP size of a given floating-point number x
. It first takes the absolute value of x
using fabs(x)
, then uses nextafterf(x, INFINITY)
to find the next representable float value after x
in the direction of positive infinity. The difference between these two values gives the ULP size for x
.
Floating-Point Number | ULP (Macro) | ULP (Float) | ULP (Float Hex Representation) | Description |
---|---|---|---|---|
1.0 | FLT_EPSILON | 1.1920929e-07 | 0x1.0p-23 | Smallest step near 1.0 |
0.0 | FLT_TRUE_MIN | 1.4012985e-45 | 0x1.0p-149 | Smallest positive float |
0.5 | FLT_EPSILON/2 | 5.9604645e-08 | 0x1.0p-24 | Half the step size near 1.0 |
10.0 | FLT_EPSILON*8 | 9.536743e-07 | 0x1.0p-20 | Larger step size near 10.0 |
Calculating ULP Difference
Simple ULP Difference
It is relative hard to correctly calculate the ULP difference between two floating-point numbers. The following C function is a simple way to calculate the ULP difference between a reference value and a given value:
float flt_ulp_diff(float ref, float val) {
return fabs(ref - val) / flt_ulp_size(ref);
}
This function simple divides the absolute difference by the ULP. The problem with this approach is which ULP to use. You can use the ULP of the reference value, approximated value or maximum/minimum/average or both values. In the given implementation the ULP of the reference value is used following the idea that the value of an potentially inaccurate approximation should not influence the precision for comparison. However, whatever ULP you choose, it will not be perfect. E.g. in the provided function flt_ulp_diff(0.9999, 2.0)
will be around factor 2 greater than flt_ulp_diff(1.0, 2.0)
which is not very intuitive.
Correct ULP Difference
To correctly calculate the ULP difference between two floating-point numbers, you can use the following C function:
uint32_t ulp_intdiff_float(float f1, float f2) {
if (signbit(f1) != signbit(f2)) {
return ulp_intdiff_float(0.0f, fabs(f1)) + ulp_intdiff_float(0.0f, fabs(f2)) + 1;
}
uint32_t i1 = *(uint32_t*)&f1;
uint32_t i2 = *(uint32_t*)&f2;
return i1 > i2 ? i1 - i2 : i2 - i1;
}
This function calculates the integer difference in ULP between two floating-point numbers f1
and f2
. The trick is to reinterpret the floating-point numbers as 32-bit unsigned integers and then calculate the difference between them. The floating-point IEEE 754 representation is designed in such a way that the integer difference between two floating-point numbers (with the same sign) is the ULP difference between them.
To also consider floating-point numbers with different signs, the function calculates the ULP difference towards zero for both numbers and adds the differences together. Additionally, it adds 1 to the result that ULP difference between negative and positive zero (ulp_intdiff_float(-0.0f, 0.0f)
) is 1. Adding 1 to the result is not necessary if you do not care about the difference between negative and positive zero.
This implementation satisfies both the properties of triangular additivity (ulp_intdiff_float(x, y) + ulp_intdiff_float(y, z) == ulp_intdiff_float(x, z) for x < y < z
) and commutativity (ulp_intdiff_float(x, y) == ulp_intdiff_float(y, x)
) of ULP differences, ensuring that the ULP difference between any two floating-point numbers is consistent and order-independent when summed across intermediates.
Examples
Input 1 | Input 2 | ULP Difference | Explanation |
---|---|---|---|
1.0 | 1.0 + FLT_EPSILON | 1 | Adjacent floating-point values |
1.0 | 1.0 - FLT_EPSILON | 2 | One step away in the opposite direction |
FLT_TRUE_MIN | 0.0 | 1 | Smallest positive float to zero |
-FLT_TRUE_MIN | FLT_TRUE_MIN | 3 | Crossing zero, including sign change |
-1.0 | -1.0 - FLT_EPSILON | 1 | Adjacent floating-point values on the negative side |
-1.0 | -1.0 + FLT_EPSILON | 2 | One step away in the opposite direction on the negative side |
-0.0 | 0.0 | 1 | Difference between signed zero values |
0.0 | 0.0 | 0 | No difference between identical values |
FLT_MAX | INFINITY | 1 | Transition from the largest finite float to infinity |
-FLT_MAX | -INFINITY | 1 | Transition from the largest negative finite float to negative infinity |
1.0 | 0.5 | 8388608 | Number of representable floats between 1.0 and 0.5 (2^23) |
1.0 | 2.0 | 8388608 | Number of representable floats between 1.0 and 2.0 (2^23) |
Calculating Rational ULP Differences
When comparing a reference value provided as a double
and an approximated value as a float
, rational ULP differences can provide a finer granularity of comparison. The function below implements this computation by combining integer ULP differences with a fractional component derived from the difference between the double
reference value and its float
representation.
The following function calculates rational ULP differences:
double ulp_diff(double ref, float b) {
// Handle NaN cases: returns 0 if both are NaN, INFINITY otherwise
if (isnan(b) || isnan(ref)) {
return isnan(b) == isnan(ref) ? 0.0 : INFINITY;
}
// Convert reference value to float for integer ULP difference calculation
float fref = (float)ref;
// Calculate integer ULP difference between the float reference and test value
double iulp = (double) ulp_intdiff_float(fref, b);
// Ensure positive values for further calculations
ref = fabs(ref);
fref = fabsf(fref);
b = fabsf(b);
// Calculate the ULP size of the rounded-down reference value
float ulpRef = flt_ulp_size(double_to_float_rounddown(ref));
// Calculate the fractional difference in ULPs
double diff = (ref - (double)fref);
double fulp = diff / ulpRef;
if (b > ref)
fulp = -fulp;
// Return the sum of integer ULPs and fractional ULPs
return iulp + fulp;
}
The function first handles special cases like NaN values, then calculates the integer ULP difference between the float-converted reference value and the approximated value. It then computes the fractional ULP difference based on the difference between the original double reference value and its float representation. We can use the method from the simple ULP difference to calculate the fractional ULP difference. That is working here because the ULP size for the fractional part is clearly defined by the location of the reference value.
One problem is the rounding when converting the double to float. E.g. 1.0 - FLT_EPSILON/4 require the ULP size of the range [0.5, 1.0) but would be converted to 1.0 in a normal double to float conversion. This is why the function uses double_to_float_rounddown
to convert the double to float. This function converts the double to float and rounds down to the next smaller float value:
float double_to_float_rounddown(double value) {
float float_value = (float)value;
if (float_value > value) {
return nextafterf(float_value, -INFINITY);
}
return float_value;
}
Examples
The following table provides examples of the results obtained using the ulp_diff
function:
Reference Value | Test Value | ULP Difference | Explanation |
---|---|---|---|
1.0 + FLT_EPSILON/2 | 1.0 | 0.5 | The test value is halfway between two floats. |
1.0 + FLT_EPSILON/4 | 1.0 | 0.25 | The test value is a quarter ULP away. |
1.0 - FLT_EPSILON/2 | 1.0 | 1.0 | The test value is 1 ULP lower. |
1.0 - FLT_EPSILON/4 | 1.0 | 0.5 | The test value is halfway to the next float. |
1.0 + FLT_EPSILON/2 | 1.0 + FLT_EPSILON*10 | 9.5 | Integer difference with fractional part. |
1.0 + FLT_EPSILON/4 | 1.0 + FLT_EPSILON*10 | 9.75 | Includes both integer and fractional differences. |
1.0 - FLT_EPSILON/2 | 1.0 - FLT_EPSILON*10 | 19.0 | The difference spans multiple ULPs. |
1.0 - FLT_EPSILON/4 | 1.0 - FLT_EPSILON*10 | 19.5 | Combined fractional and integer ULP difference. |
0.0 + FLT_TRUE_MIN/100 | 0.0 | 0.01 | A very small fractional ULP difference. |
Accuracy of Mathematical Functions in Math Libraries
The following table gives an overview of the ULP errors in widely used math libraries for single-precision floating-point math functions. The ULP error is the largest error in ULP between the exact result and the result of the function. The provided ULP error is exact for math functions with one parameter and the largest known error for two-parameter functions like atan2.
library
version |
GNU libc
2.40 |
IML
2024.0.2 |
AMD
4.2 |
Newlib
4.4.0 |
OpenLibm
0.8.3 |
Musl
1.2.5 |
Apple
14.5 |
LLVM
18.1.8 |
MSVC
2022 |
FreeBSD
14.1 |
ArmPL
24.04 |
CUDA
12.2.1 |
ROCm
5.7.0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
acos | 0.899 | 0.528 | 0.897 | 0.899 | 0.918 | 0.918 | 0.634 | 0.500 | 0.669 | 0.918 | 1.32 | 1.34 | 1.47 |
acosh | 2.01 | 0.501 | 0.504 | 2.01 | 2.01 | 2.01 | 0.502 | 0.500 | 2.89 | 2.01 | 2.79 | 2.18 | 0.564 |
asin | 0.898 | 0.528 | 0.781 | 0.926 | 0.743 | 0.743 | 0.634 | 0.500 | 0.861 | 0.743 | 2.41 | 1.36 | 2.54 |
asinh | 1.78 | 0.527 | 0.518 | 1.78 | 1.78 | 1.78 | 0.515 | 0.500 | 1.99 | 1.78 | 3.57 | 1.78 | 0.573 |
atan | 0.853 | 0.541 | 0.501 | 0.853 | 0.853 | 0.853 | 0.722 | 0.500 | 0.501 | 0.853 | 2.88 | 1.21 | 2.10 |
atanh | 1.73 | 0.507 | 0.547 | 1.73 | 1.73 | 1.73 | 0.511 | 0.500 | 2.35 | 1.73 | 3.09 | 3.16 | 0.574 |
cbrt | 0.969 | 0.520 | 0.548 | 3.56 | 0.500 | 0.500 | 0.500 | 1.83 | 0.500 | 1.53 | 1.17 | 1.14 | |
cos | 0.561 | 0.548 | 0.729 | 2.91 | 0.501 | 0.501 | 0.846 | 0.500 | 0.530 | 0.501 | 0.561 | 1.52 | 1.61 |
cosh | 1.89 | 0.506 | 1.03 | 2.51 | 1.36 | 1.03 | 0.579 | 0.500 | 0.500 | 1.36 | 1.89 | 2.34 | 0.567 |
erf | 0.968 | 0.780 | 0.531 | 0.968 | 0.943 | 0.968 | 0.501 | 0.500 | 3.99 | 0.890 | 1.93 | 1.04 | 1.51 |
erfc | 3.13 | 0.934 | 63.9 | 3.17 | 3.13 | 0.750 | 6.66 | 3.18 | 1.64 | 4.49 | 3.33 | ||
exp | 0.502 | 0.506 | 0.501 | 0.911 | 0.911 | 0.502 | 0.514 | 0.500 | 0.501 | 0.911 | 0.502 | 1.94 | 1.00 |
exp10 | 0.502 | 0.507 | 0.501 | 1.06 | 3.88 | 0.514 | 0.500 | 2.07 | 1.00 | ||||
exp2 | 0.502 | 0.519 | 0.501 | 1.02 | 0.501 | 0.502 | 0.514 | 0.500 | 2.14 | 0.501 | 0.502 | 2.39 | 0.871 |
expm1 | 0.813 | 0.544 | 0.536 | 0.813 | 0.813 | 0.813 | 0.687 | 0.500 | 3.02 | 0.813 | 1.51 | 1.45 | 1.45 |
j0 | 9.00 | 0.678 | 6.18e6 | 3.66e6 | 3.66e6 | 3.66e6 | 3.78e10 | 7.60e7 | |||||
j1 | 9.00 | 1.69 | 1.68e7 | 2.25e6 | 2.25e6 | 2.25e6 | 7.48e9 | 7.53e7 | |||||
lgamma | 6.78 | 0.510 | 7.50e6 | 7.50e6 | 7.50e6 | 0.501 | 2.92e5 | 7.50e6 | 1.35e7 | 7.50e6 | |||
log | 0.818 | 0.519 | 0.577 | 0.888 | 0.888 | 0.818 | 0.511 | 0.500 | 0.562 | 0.888 | 0.818 | 0.865 | 1.89 |
log10 | 2.07 | 0.516 | 1.40 | 2.10 | 0.832 | 0.832 | 0.502 | 0.500 | 0.626 | 0.832 | 0.82 | 2.09 | 1.71 |
log1p | 1.30 | 0.525 | 0.501 | 1.30 | 0.839 | 0.835 | 0.513 | 0.500 | 1.44 | 0.839 | 2.02 | 0.887 | 0.579 |
log2 | 0.752 | 0.508 | 0.766 | 1.65 | 0.865 | 0.752 | 0.502 | 0.500 | 2.04 | 0.865 | 0.752 | 0.919 | 1.00 |
sin | 0.561 | 0.546 | 0.530 | 1.37 | 0.501 | 0.501 | 0.846 | 0.500 | 0.530 | 0.501 | 0.561 | 1.50 | 1.61 |
sinh | 1.89 | 0.538 | 0.500 | 2.51 | 1.83 | 1.83 | 0.579 | 0.500 | 0.501 | 1.83 | 2.26 | 2.94 | 0.922 |
sqrt | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | 0.500 | |
tan | 1.48 | 0.520 | 0.509 | 3.48 | 0.800 | 0.800 | 0.746 | 0.500 | 0.502 | 0.800 | 3.30 | 3.10 | 2.33 |
tanh | 2.19 | 0.514 | 0.500 | 2.19 | 2.19 | 2.19 | 0.817 | 0.500 | 1.27 | 2.19 | 2.59 | 1.82 | 1.41 |
tgamma | 7.91 | 0.510 | 239 | 0.501 | 0.501 | 0.501 | 3.58e5 | 0.501 | 4.34 | 1.68e7 | |||
y0 | 8.98 | 3.40 | 4.84e6 | 4.84e6 | 4.84e6 | 4.84e6 | 2.36e10 | 7.53e7 | |||||
y1 | 9.00 | 2.07 | 6.18e6 | 4.17e6 | 3.66e6 | 4.17e6 | 4.96e10 | 9.35e7 | |||||
atan2 | 1.52 | 0.550 | 0.584 | 1.52 | 1.55 | 1.55 | 0.722 | 0.584 | 1.55 | 2.93 | 2.18 | 2.01 | |
atan2pi | 0.841 | ||||||||||||
compound | 0.501 | ||||||||||||
hypot | 0.501 | 0.501 | 0.501 | 1.21 | 1.21 | 0.927 | 0.501 | 0.500 | 0.501 | 1.21 | 1.03 | 1.57 | |
pow | 0.817 | 0.515 | 1.56 | 169. | 0.970 | 0.817 | 0.515 | 0.500 | 0.568 | 0.970 | 0.817 | 2.60 | 1.40 |
Further Reading
- Unit in the last place article on Wikipedia
- On the definition of ulp (x) [2]
- Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision [1]
- Official glibc documentation about their math function errors
References
- ↑ 1.0 1.1 Brian Gladman, Vincenzo Innocente, John Mather, Paul Zimmermann. Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. 2024. ⟨hal-03141101v7⟩
- ↑ Jean-Michel Muller. On the definition of ulp(x). [Research Report] RR-5504, LIP RR-2005-09, INRIA, LIP. 2005, pp.16. ⟨inria-00070503⟩