ULP Difference of Float Numbers

From emmtrix Wiki
Jump to navigation Jump to search

The Unit in the Last Place (ULP) is the smallest difference between two adjacent floating-point numbers at a specific precision. It measures the granularity of floating-point representations and is essential for understanding rounding errors and numerical stability. For example, a ULP near 1.0 is typically smaller than a ULP near 10.0, as the step size depends on the exponent of the number.

ULP differences are often used to evaluate the accuracy of floating-point operations. When comparing two floating-point numbers, a difference of 1 ULP indicates that the numbers are as close as they can be while still being distinct. Smaller ULP differences imply better precision and numerical accuracy. When implementing floating-point operations or algorithms (e.g., the tanhf function), the smallest possible ULP difference is 0.5 indicating that the exact result is correctly rounded to the nearest floating-point number.

Calculating ULP

To calculate the ULP of a floating-point number, you can use the following C function:

#include <math.h>

float flt_ulp_size(float x) {
    x = fabs(x);

    return nextafterf(x, INFINITY) - x;
}

This function calculates the ULP size of a given floating-point number x. It first takes the absolute value of x using fabs(x), then uses nextafterf(x, INFINITY) to find the next representable float value after x in the direction of positive infinity. The difference between these two values gives the ULP size for x.

ULP Example Values 32-bit Float
Floating-Point Number ULP (Macro) ULP (Float) ULP (Float Hex Representation) Description
1.0 FLT_EPSILON 1.1920929e-07 0x1.0p-23 Smallest step near 1.0
0.0 FLT_TRUE_MIN 1.4012985e-45 0x1.0p-149 Smallest positive float
0.5 FLT_EPSILON/2 5.9604645e-08 0x1.0p-24 Half the step size near 1.0
10.0 FLT_EPSILON*8 9.536743e-07 0x1.0p-20 Larger step size near 10.0

Calculating ULP Difference

Simple ULP Difference

It is relative hard to correctly calculate the ULP difference between two floating-point numbers. The following C function is a simple way to calculate the ULP difference between a reference value and a given value:

float flt_ulp_diff(float ref, float val) {
    return fabs(ref - val) / flt_ulp_size(ref);
}

This function simple divides the absolute difference by the ULP. The problem with this approach is which ULP to use. You can use the ULP of the reference value, approximated value or maximum/minimum/average or both values. In the given implementation the ULP of the reference value is used following the idea that the value of an potentially inaccurate approximation should not influence the precision for comparison. However, whatever ULP you choose, it will not be perfect. E.g. in the provided function flt_ulp_diff(0.9999, 2.0) will be around factor 2 greater than flt_ulp_diff(1.0, 2.0) which is not very intuitive.

Correct ULP Difference

To correctly calculate the ULP difference between two floating-point numbers, you can use the following C function:

uint32_t ulp_intdiff_float(float f1, float f2) {
    if (signbit(f1) != signbit(f2)) {
        return ulp_intdiff_float(0.0f, fabs(f1)) + ulp_intdiff_float(0.0f, fabs(f2)) + 1;
    }

    uint32_t i1 = *(uint32_t*)&f1;
    uint32_t i2 = *(uint32_t*)&f2;

    return i1 > i2 ? i1 - i2 : i2 - i1;
}

This function calculates the integer difference in ULP between two floating-point numbers f1 and f2. The trick is to reinterpret the floating-point numbers as 32-bit unsigned integers and then calculate the difference between them. The floating-point IEEE 754 representation is designed in such a way that the integer difference between two floating-point numbers (with the same sign) is the ULP difference between them.

To also consider floating-point numbers with different signs, the function calculates the ULP difference towards zero for both numbers and adds the differences together. Additionally, it adds 1 to the result that ULP difference between negative and positive zero (ulp_intdiff_float(-0.0f, 0.0f)) is 1. Adding 1 to the result is not necessary if you do not care about the difference between negative and positive zero.

This implementation satisfies both the properties of triangular additivity (ulp_intdiff_float(x, y) + ulp_intdiff_float(y, z) == ulp_intdiff_float(x, z) for x < y < z) and commutativity (ulp_intdiff_float(x, y) == ulp_intdiff_float(y, x)) of ULP differences, ensuring that the ULP difference between any two floating-point numbers is consistent and order-independent when summed across intermediates.

Examples

Example Results for `ulp_intdiff_float` Function
Input 1 Input 2 ULP Difference Explanation
1.0 1.0 + FLT_EPSILON 1 Adjacent floating-point values
1.0 1.0 - FLT_EPSILON 2 One step away in the opposite direction
FLT_TRUE_MIN 0.0 1 Smallest positive float to zero
-FLT_TRUE_MIN FLT_TRUE_MIN 3 Crossing zero, including sign change
-1.0 -1.0 - FLT_EPSILON 1 Adjacent floating-point values on the negative side
-1.0 -1.0 + FLT_EPSILON 2 One step away in the opposite direction on the negative side
-0.0 0.0 1 Difference between signed zero values
0.0 0.0 0 No difference between identical values
FLT_MAX INFINITY 1 Transition from the largest finite float to infinity
-FLT_MAX -INFINITY 1 Transition from the largest negative finite float to negative infinity
1.0 0.5 8388608 Number of representable floats between 1.0 and 0.5 (2^23)
1.0 2.0 8388608 Number of representable floats between 1.0 and 2.0 (2^23)

Calculating Rational ULP Differences

When comparing a reference value provided as a double and an approximated value as a float, rational ULP differences can provide a finer granularity of comparison. The function below implements this computation by combining integer ULP differences with a fractional component derived from the difference between the double reference value and its float representation.

The following function calculates rational ULP differences:

double ulp_diff(double ref, float b) {
    // Handle NaN cases: returns 0 if both are NaN, INFINITY otherwise
    if (isnan(b) || isnan(ref)) {
        return isnan(b) == isnan(ref) ? 0.0 : INFINITY;
    }

    // Convert reference value to float for integer ULP difference calculation
    float fref = (float)ref;

    // Calculate integer ULP difference between the float reference and test value
    double iulp = (double) ulp_intdiff_float(fref, b);

    // Ensure positive values for further calculations
    ref = fabs(ref);
    fref = fabsf(fref);
    b = fabsf(b);

    // Calculate the ULP size of the rounded-down reference value
    float ulpRef = flt_ulp_size(double_to_float_rounddown(ref));

    // Calculate the fractional difference in ULPs
    double diff = (ref - (double)fref);
    double fulp = diff / ulpRef;
    if (b > ref)
        fulp = -fulp;

    // Return the sum of integer ULPs and fractional ULPs
    return iulp + fulp;
}

The function first handles special cases like NaN values, then calculates the integer ULP difference between the float-converted reference value and the approximated value. It then computes the fractional ULP difference based on the difference between the original double reference value and its float representation. We can use the method from the simple ULP difference to calculate the fractional ULP difference. That is working here because the ULP size for the fractional part is clearly defined by the location of the reference value.

One problem is the rounding when converting the double to float. E.g. 1.0 - FLT_EPSILON/4 require the ULP size of the range [0.5, 1.0) but would be converted to 1.0 in a normal double to float conversion. This is why the function uses double_to_float_rounddown to convert the double to float. This function converts the double to float and rounds down to the next smaller float value:

float double_to_float_rounddown(double value) {
    float float_value = (float)value;

    if (float_value > value) {
        return nextafterf(float_value, -INFINITY);
    }

    return float_value;
}

Examples

The following table provides examples of the results obtained using the ulp_diff function:

Example Results for ulp_diff Function
Reference Value Test Value ULP Difference Explanation
1.0 + FLT_EPSILON/2 1.0 0.5 The test value is halfway between two floats.
1.0 + FLT_EPSILON/4 1.0 0.25 The test value is a quarter ULP away.
1.0 - FLT_EPSILON/2 1.0 1.0 The test value is 1 ULP lower.
1.0 - FLT_EPSILON/4 1.0 0.5 The test value is halfway to the next float.
1.0 + FLT_EPSILON/2 1.0 + FLT_EPSILON*10 9.5 Integer difference with fractional part.
1.0 + FLT_EPSILON/4 1.0 + FLT_EPSILON*10 9.75 Includes both integer and fractional differences.
1.0 - FLT_EPSILON/2 1.0 - FLT_EPSILON*10 19.0 The difference spans multiple ULPs.
1.0 - FLT_EPSILON/4 1.0 - FLT_EPSILON*10 19.5 Combined fractional and integer ULP difference.
0.0 + FLT_TRUE_MIN/100 0.0 0.01 A very small fractional ULP difference.

Accuracy of Mathematical Functions in Math Libraries

The following table gives an overview of the ULP errors in widely used math libraries for single-precision floating-point math functions. The ULP error is the largest error in ULP between the exact result and the result of the function. The provided ULP error is exact for math functions with one parameter and the largest known error for two-parameter functions like atan2.

Largest (known) error in ULP for single precision math functions[1]
library

version

GNU libc

2.40

IML

2024.0.2

AMD

4.2

Newlib

4.4.0

OpenLibm

0.8.3

Musl

1.2.5

Apple

14.5

LLVM

18.1.8

MSVC

2022

FreeBSD

14.1

ArmPL

24.04

CUDA

12.2.1

ROCm

5.7.0

acos 0.899 0.528 0.897 0.899 0.918 0.918 0.634 0.500 0.669 0.918 1.32 1.34 1.47
acosh 2.01 0.501 0.504 2.01 2.01 2.01 0.502 0.500 2.89 2.01 2.79 2.18 0.564
asin 0.898 0.528 0.781 0.926 0.743 0.743 0.634 0.500 0.861 0.743 2.41 1.36 2.54
asinh 1.78 0.527 0.518 1.78 1.78 1.78 0.515 0.500 1.99 1.78 3.57 1.78 0.573
atan 0.853 0.541 0.501 0.853 0.853 0.853 0.722 0.500 0.501 0.853 2.88 1.21 2.10
atanh 1.73 0.507 0.547 1.73 1.73 1.73 0.511 0.500 2.35 1.73 3.09 3.16 0.574
cbrt 0.969 0.520 0.548 3.56 0.500 0.500 0.500 1.83 0.500 1.53 1.17 1.14
cos 0.561 0.548 0.729 2.91 0.501 0.501 0.846 0.500 0.530 0.501 0.561 1.52 1.61
cosh 1.89 0.506 1.03 2.51 1.36 1.03 0.579 0.500 0.500 1.36 1.89 2.34 0.567
erf 0.968 0.780 0.531 0.968 0.943 0.968 0.501 0.500 3.99 0.890 1.93 1.04 1.51
erfc 3.13 0.934 63.9 3.17 3.13 0.750 6.66 3.18 1.64 4.49 3.33
exp 0.502 0.506 0.501 0.911 0.911 0.502 0.514 0.500 0.501 0.911 0.502 1.94 1.00
exp10 0.502 0.507 0.501 1.06 3.88 0.514 0.500 2.07 1.00
exp2 0.502 0.519 0.501 1.02 0.501 0.502 0.514 0.500 2.14 0.501 0.502 2.39 0.871
expm1 0.813 0.544 0.536 0.813 0.813 0.813 0.687 0.500 3.02 0.813 1.51 1.45 1.45
j0 9.00 0.678 6.18e6 3.66e6 3.66e6 3.66e6 3.78e10 7.60e7
j1 9.00 1.69 1.68e7 2.25e6 2.25e6 2.25e6 7.48e9 7.53e7
lgamma 6.78 0.510 7.50e6 7.50e6 7.50e6 0.501 2.92e5 7.50e6 1.35e7 7.50e6
log 0.818 0.519 0.577 0.888 0.888 0.818 0.511 0.500 0.562 0.888 0.818 0.865 1.89
log10 2.07 0.516 1.40 2.10 0.832 0.832 0.502 0.500 0.626 0.832 0.82 2.09 1.71
log1p 1.30 0.525 0.501 1.30 0.839 0.835 0.513 0.500 1.44 0.839 2.02 0.887 0.579
log2 0.752 0.508 0.766 1.65 0.865 0.752 0.502 0.500 2.04 0.865 0.752 0.919 1.00
sin 0.561 0.546 0.530 1.37 0.501 0.501 0.846 0.500 0.530 0.501 0.561 1.50 1.61
sinh 1.89 0.538 0.500 2.51 1.83 1.83 0.579 0.500 0.501 1.83 2.26 2.94 0.922
sqrt 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500 0.500
tan 1.48 0.520 0.509 3.48 0.800 0.800 0.746 0.500 0.502 0.800 3.30 3.10 2.33
tanh 2.19 0.514 0.500 2.19 2.19 2.19 0.817 0.500 1.27 2.19 2.59 1.82 1.41
tgamma 7.91 0.510 239 0.501 0.501 0.501 3.58e5 0.501 4.34 1.68e7
y0 8.98 3.40 4.84e6 4.84e6 4.84e6 4.84e6 2.36e10 7.53e7
y1 9.00 2.07 6.18e6 4.17e6 3.66e6 4.17e6 4.96e10 9.35e7
atan2 1.52 0.550 0.584 1.52 1.55 1.55 0.722 0.584 1.55 2.93 2.18 2.01
atan2pi 0.841
compound 0.501
hypot 0.501 0.501 0.501 1.21 1.21 0.927 0.501 0.500 0.501 1.21 1.03 1.57
pow 0.817 0.515 1.56 169. 0.970 0.817 0.515 0.500 0.568 0.970 0.817 2.60 1.40

Further Reading

References

  1. 1.0 1.1 Brian Gladman, Vincenzo Innocente, John Mather, Paul Zimmermann. Accuracy of Mathematical Functions in Single, Double, Double Extended, and Quadruple Precision. 2024. ⟨hal-03141101v7⟩
  2. Jean-Michel Muller. On the definition of ulp(x). [Research Report] RR-5504, LIP RR-2005-09, INRIA, LIP. 2005, pp.16. ⟨inria-00070503⟩