Finding Highest ULP of Math Functions

This article discusses methods for measuring and finding the maximum Unit in the Last Place (ULP) error in floating-point math functions. It covers both single-precision (binary32) and double-precision (binary64) IEEE 754 formats, highlighting the impracticality of exhaustive search for large input spaces and presenting a universal search algorithm that focuses on identifying inputs likely to produce large errors.

Background and Motivation

In the IEEE 754 single-precision format (binary32), there are $2^{32}-2^{24}$ finite representable values (excluding $\pm \infty$ and NaN). For a function with one input parameter, it is still feasible to exhaustively test all possible values. However, difficulties arise when:

Single-precision bivariate functions: The input space could be as large as $2^{64}$ pairs of 32-bit floats, making an exhaustive search prohibitively expensive with academic resources.
Double-precision (binary64) functions: The input space grows even larger ( $2^{64}$ possible 64-bit patterns for a univariate function, and $2^{128}$ for a bivariate function), making a full enumeration infeasible.

Although a full enumeration for some single-precision univariate functions can be practical, bivariate or double-precision functions present exponentially larger input spaces. Exhaustive testing quickly becomes impossible with standard computational resources. Hence, a “black-box” approach is beneficial:

It avoids manual analysis of library code.
It does not require highly specialized tests for each function or library.
It can detect errors such as incorrect argument reduction for large inputs (e.g., near 2^{1024} for sin or cos) without manually coding edge-case checks.

Motivation for a Universal Algorithm

Exhaustive search for single-precision univariate functions is sometimes practical. However, for higher-dimensional or double-precision functions, the sheer number of inputs makes exhaustive testing nearly impossible with standard computational resources.

A “black-box” approach to testing:

Avoids manual analysis of library code.
Eliminates the need for highly specialized tests for each function or library.
Allows the detection of errors such as incorrect argument reduction for large inputs (e.g., near $2^{1024}$ for $\sin$ or $\cos$ ) without manually coding those edge-case checks.

Universal Search Algorithm

This section outlines the subdivision-based search algorithm for univariate double-precision functions, though the same approach can be adapted to:

Any IEEE 754 format, provided there is a corresponding integer type of the same bit width.
Bivariate or higher-dimensional functions (with appropriate modifications).

Algorithm Overview

Suppose we have a univariate function $f(x)$ in double precision. Each possible input $x$ can be associated with a 64-bit integer, thanks to the binary64 format. Let:

to_double(uint64_t) be a function that converts a 64-bit unsigned integer to its corresponding double-precision floating-point number.
check(double x) be a function that checks f at position x and returns the ULP error
[a, b] be an inclusive 64-bit integer range representing a subrange of possible inputs.
t be a threshold indicating when to switch from sampling to exhaustive checking.

The algorithm proceeds as follows:

Base Case (Exhaustive Check): If the interval size $(b-a)<t$ $(b-a)<t$ , then test all integers i in [a, b].
- For each i in [a, b]
  - convert i to double: x = to_double(i)
  - check f at x: err = check(x)
  - and keep track of the largest observed error.
Recursive Subdivision: If the interval size $(b-a)\geq t$ $(b-a)\geq t$
- Subdivide [a, b] into two (almost equal) sub-intervals.
- In each subinterval, randomly sample t values.
- Convert each sampled integer to double and compute the ULP error as above.
- Identify the subrange producing the largest observed error.
- Recurse on that subrange.

This subdivision continues until the interval width becomes less than the total number of function evaluations already performed. At that point, the algorithm can afford an exhaustive search on the remaining values without excessively prolonging the total runtime.

Practical Optimization: “Top-K” Subintervals

A more refined strategy, suggested by Eric Schneider, modifies the subdivision step to keep multiple promising subintervals at each level instead of focusing on just one:

Maintain a list of, for example, 20 subintervals (instead of one) with the largest observed errors.
Subdivide each selected subinterval, yielding 40 (or 80 for bivariate) new subintervals in total.
Retain the 20 most “promising” intervals out of these newly generated ones.
Continue until the intervals are small enough to be exhaustively checked or until resources are exhausted.

Additional Strategies for Selecting Subintervals

Several heuristics can be used to choose the “best” subintervals at each recursion step:

Maximal ULP error: Keep the subinterval with the highest single-sample error.
Maximal average ULP error: Keep the subinterval whose sampled points give the highest average ULP error (ignoring cases that produce NaN or $\pm \infty$ ).
Statistical estimate: Model the error distribution on each subinterval and estimate the maximum ULP error for the number of points in that interval.

In practice, the first strategy often finds large errors most quickly, while the second and third strategies can occasionally discover extreme corner cases. A useful approach on multi-core systems is to run strategy 2 and 3 each on one core and strategy 1 on the remaining cores.

Extension to Bivariate Functions

For bivariate functions, there are two approaches to adapt the algorithm. Suppose we have a bivariate function $f(x,y)$ in single precision and the to_single(uint32_t>) function that converts a 32-bit unsigned integer to its corresponding single-precision float.

We can use two 32-bit integers i1 and i2 to represent each input. The integers are then converted to single-precision floats using to_single. The intervals become rectangular intervals and subdivide of the rectangular intervals create four sub-intervals.
Alternatively, we can use a single 64-bit integer to represent the input. The intervals are still one-dimensional. The 64-bit integer is split into two 32-bit using odd and even bits before converting them to single-precision floats. The algorithm then proceeds as before.

Conclusions

Finding the highest ULP error for floating-point math functions is an essential part of ensuring numerical accuracy. While exhaustive methods are still possible for some single-precision use cases, more complex scenarios require search-based or sampling-based algorithms. The general-purpose algorithm outlined here:

Does not rely on inspecting the internal implementations of math libraries.
Does not require function-specific logic.
Subdivides large input spaces, refining searches where large errors are found.

The approach yields a practical, largely automated tool for pinpointing problematic inputs—even in extreme domains—offering valuable insights into the accuracy and robustness of floating-point math libraries.