TY - JOUR

T1 - On the definition of unit roundoff

AU - Rump, Siegfried M.

AU - Lange, Marko

PY - 2016/3/1

Y1 - 2016/3/1

N2 - The result of a floating-point operation is usually defined to be the floating-point number nearest to the exact real result together with a tie-breaking rule. This is called the first standard model of floating-point arithmetic, and the analysis of numerical algorithms is often solely based on that. In addition, a second standard model is used specifying the maximum relative error with respect to the computed result. In this note we take a more general perspective. For an arbitrary finite set of real numbers we identify the rounding to minimize the relative error in the first or the second standard model. The optimal “switching points” are the arithmetic or the harmonic means of adjacent floating-point numbers. Moreover, the maximum relative error of both models is minimized by taking the geometric mean. If the maximum relative error in one model is (Formula presented.) , then (Formula presented.) is the maximum relative error in the other model. Those maximal errors, that is the unit roundoff, are characteristic constants of a given finite set of reals: The floating-point model to be optimized identifies the rounding and the unit roundoff.

AB - The result of a floating-point operation is usually defined to be the floating-point number nearest to the exact real result together with a tie-breaking rule. This is called the first standard model of floating-point arithmetic, and the analysis of numerical algorithms is often solely based on that. In addition, a second standard model is used specifying the maximum relative error with respect to the computed result. In this note we take a more general perspective. For an arbitrary finite set of real numbers we identify the rounding to minimize the relative error in the first or the second standard model. The optimal “switching points” are the arithmetic or the harmonic means of adjacent floating-point numbers. Moreover, the maximum relative error of both models is minimized by taking the geometric mean. If the maximum relative error in one model is (Formula presented.) , then (Formula presented.) is the maximum relative error in the other model. Those maximal errors, that is the unit roundoff, are characteristic constants of a given finite set of reals: The floating-point model to be optimized identifies the rounding and the unit roundoff.

KW - Floating-point number

KW - IEEE 754

KW - Rounding

KW - Tie

UR - http://www.scopus.com/inward/record.url?scp=84924973222&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924973222&partnerID=8YFLogxK

U2 - 10.1007/s10543-015-0554-0

DO - 10.1007/s10543-015-0554-0

M3 - Article

AN - SCOPUS:84924973222

VL - 56

SP - 309

EP - 317

JO - BIT

JF - BIT

SN - 0006-3835

IS - 1

ER -