### Abstract

In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Res_{v} of Κ nonoverlapping floating-point numbers such that σ Res_{v} approximates r with relative accuracy eps^{Κ}, and replacing Res_{κ} by its floating-point neighbors in Y. Res_{v} forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

Original language | English |
---|---|

Pages (from-to) | 1269-1302 |

Number of pages | 34 |

Journal | SIAM Journal on Scientific Computing |

Volume | 31 |

Issue number | 2 |

DOIs | |

Publication status | Published - 2008 |

### Fingerprint

### Keywords

- Κ-fold accuracy
- Directed rounding
- Distillation
- Error analysis
- Error-free transformations
- Faithful rounding
- High accuracy
- Maximally accurate summation
- Rounding to nearest
- Sign
- XBLAS

### ASJC Scopus subject areas

- Applied Mathematics
- Computational Mathematics

### Cite this

*SIAM Journal on Scientific Computing*,

*31*(2), 1269-1302. https://doi.org/10.1137/07068816X

**Accurate floating-point summation part II : Sign. Κ-Fold faithful and rounding to nearest.** / Rumpt, Siegfried M.; Ogita, Takeshi; Oishi, Shinichi.

Research output: Contribution to journal › Article

*SIAM Journal on Scientific Computing*, vol. 31, no. 2, pp. 1269-1302. https://doi.org/10.1137/07068816X

}

TY - JOUR

T1 - Accurate floating-point summation part II

T2 - Sign. Κ-Fold faithful and rounding to nearest

AU - Rumpt, Siegfried M.

AU - Ogita, Takeshi

AU - Oishi, Shinichi

PY - 2008

Y1 - 2008

N2 - In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

AB - In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

KW - Κ-fold accuracy

KW - Directed rounding

KW - Distillation

KW - Error analysis

KW - Error-free transformations

KW - Faithful rounding

KW - High accuracy

KW - Maximally accurate summation

KW - Rounding to nearest

KW - Sign

KW - XBLAS

UR - http://www.scopus.com/inward/record.url?scp=67649610606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67649610606&partnerID=8YFLogxK

U2 - 10.1137/07068816X

DO - 10.1137/07068816X

M3 - Article

VL - 31

SP - 1269

EP - 1302

JO - SIAM Journal of Scientific Computing

JF - SIAM Journal of Scientific Computing

SN - 1064-8275

IS - 2

ER -