### Abstract

Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

Original language | English |
---|---|

Pages (from-to) | 189-224 |

Number of pages | 36 |

Journal | SIAM Journal on Scientific Computing |

Volume | 31 |

Issue number | 1 |

DOIs | |

Publication status | Published - 2008 |

### Fingerprint

### Keywords

- Distillation
- Error analysis
- Error-free transformation
- Extended and mixed precision basic linear algebra subprograms
- Faithful founding
- High accuracy
- Maximally accurate summation
- XBLAS

### ASJC Scopus subject areas

- Applied Mathematics
- Computational Mathematics

### Cite this

*SIAM Journal on Scientific Computing*,

*31*(1), 189-224. https://doi.org/10.1137/050645671

**Accurate floating-point summation part I : Faithful rounding.** / Rump, Siegfried M.; Ogita, Takeshi; Oishi, Shinichi.

Research output: Contribution to journal › Article

*SIAM Journal on Scientific Computing*, vol. 31, no. 1, pp. 189-224. https://doi.org/10.1137/050645671

}

TY - JOUR

T1 - Accurate floating-point summation part I

T2 - Faithful rounding

AU - Rump, Siegfried M.

AU - Ogita, Takeshi

AU - Oishi, Shinichi

PY - 2008

Y1 - 2008

N2 - Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

AB - Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

KW - Distillation

KW - Error analysis

KW - Error-free transformation

KW - Extended and mixed precision basic linear algebra subprograms

KW - Faithful founding

KW - High accuracy

KW - Maximally accurate summation

KW - XBLAS

UR - http://www.scopus.com/inward/record.url?scp=55049129860&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55049129860&partnerID=8YFLogxK

U2 - 10.1137/050645671

DO - 10.1137/050645671

M3 - Article

VL - 31

SP - 189

EP - 224

JO - SIAM Journal of Scientific Computing

JF - SIAM Journal of Scientific Computing

SN - 1064-8275

IS - 1

ER -