Error estimates for the summation of real numbers with application to floating-point summation

Marko Lange, Siegfried M. Rump

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Standard Wilkinson-type error estimates of floating-point algorithms involve a factor γk: = ku/ (1 - ku) for u denoting the relative rounding error unit of a floating-point number system. Recently, it was shown that, for many standard algorithms such as matrix multiplication, LU- or Cholesky decomposition, γk can be replaced by ku, and the restriction on k can be removed. However, the arguments make heavy use of specific properties of both the underlying set of floating-point numbers and the corresponding arithmetic. In this paper, we derive error estimates for the summation of real numbers where each sum is afflicted with some perturbation. Recent results on floating-point summation follow as a corollary, in particular error estimates for rounding to nearest and for directed rounding. Our new estimates are sharp and unveil the necessary properties of floating-point schemes to allow for a priori estimates of summation with a factor omitting higher order terms.

Original languageEnglish
Pages (from-to)927-941
Number of pages15
JournalBIT Numerical Mathematics
Volume57
Issue number3
DOIs
Publication statusPublished - 2017 Sep 1

Fingerprint

Floating point
Summation
Error Estimates
Rounding
Numbering systems
Cholesky Decomposition
LU decomposition
Number system
Matrix multiplication
Rounding error
Relative Error
A Priori Estimates
Decomposition
Corollary
Higher Order
Restriction
Perturbation
Unit
Necessary
Term

Keywords

  • Error analysis
  • Floating-point
  • Real numbers
  • Summation
  • Wilkinson-type error estimates

ASJC Scopus subject areas

  • Software
  • Computer Networks and Communications
  • Computational Mathematics
  • Applied Mathematics

Cite this

Error estimates for the summation of real numbers with application to floating-point summation. / Lange, Marko; Rump, Siegfried M.

In: BIT Numerical Mathematics, Vol. 57, No. 3, 01.09.2017, p. 927-941.

Research output: Contribution to journalArticle

@article{7c5cc30859a149ad9d2917e41adc15c9,
title = "Error estimates for the summation of real numbers with application to floating-point summation",
abstract = "Standard Wilkinson-type error estimates of floating-point algorithms involve a factor γk: = ku/ (1 - ku) for u denoting the relative rounding error unit of a floating-point number system. Recently, it was shown that, for many standard algorithms such as matrix multiplication, LU- or Cholesky decomposition, γk can be replaced by ku, and the restriction on k can be removed. However, the arguments make heavy use of specific properties of both the underlying set of floating-point numbers and the corresponding arithmetic. In this paper, we derive error estimates for the summation of real numbers where each sum is afflicted with some perturbation. Recent results on floating-point summation follow as a corollary, in particular error estimates for rounding to nearest and for directed rounding. Our new estimates are sharp and unveil the necessary properties of floating-point schemes to allow for a priori estimates of summation with a factor omitting higher order terms.",
keywords = "Error analysis, Floating-point, Real numbers, Summation, Wilkinson-type error estimates",
author = "Marko Lange and Rump, {Siegfried M.}",
year = "2017",
month = "9",
day = "1",
doi = "10.1007/s10543-017-0658-9",
language = "English",
volume = "57",
pages = "927--941",
journal = "BIT",
issn = "0006-3835",
publisher = "Springer Netherlands",
number = "3",

}

TY - JOUR

T1 - Error estimates for the summation of real numbers with application to floating-point summation

AU - Lange, Marko

AU - Rump, Siegfried M.

PY - 2017/9/1

Y1 - 2017/9/1

N2 - Standard Wilkinson-type error estimates of floating-point algorithms involve a factor γk: = ku/ (1 - ku) for u denoting the relative rounding error unit of a floating-point number system. Recently, it was shown that, for many standard algorithms such as matrix multiplication, LU- or Cholesky decomposition, γk can be replaced by ku, and the restriction on k can be removed. However, the arguments make heavy use of specific properties of both the underlying set of floating-point numbers and the corresponding arithmetic. In this paper, we derive error estimates for the summation of real numbers where each sum is afflicted with some perturbation. Recent results on floating-point summation follow as a corollary, in particular error estimates for rounding to nearest and for directed rounding. Our new estimates are sharp and unveil the necessary properties of floating-point schemes to allow for a priori estimates of summation with a factor omitting higher order terms.

AB - Standard Wilkinson-type error estimates of floating-point algorithms involve a factor γk: = ku/ (1 - ku) for u denoting the relative rounding error unit of a floating-point number system. Recently, it was shown that, for many standard algorithms such as matrix multiplication, LU- or Cholesky decomposition, γk can be replaced by ku, and the restriction on k can be removed. However, the arguments make heavy use of specific properties of both the underlying set of floating-point numbers and the corresponding arithmetic. In this paper, we derive error estimates for the summation of real numbers where each sum is afflicted with some perturbation. Recent results on floating-point summation follow as a corollary, in particular error estimates for rounding to nearest and for directed rounding. Our new estimates are sharp and unveil the necessary properties of floating-point schemes to allow for a priori estimates of summation with a factor omitting higher order terms.

KW - Error analysis

KW - Floating-point

KW - Real numbers

KW - Summation

KW - Wilkinson-type error estimates

UR - http://www.scopus.com/inward/record.url?scp=85018980233&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018980233&partnerID=8YFLogxK

U2 - 10.1007/s10543-017-0658-9

DO - 10.1007/s10543-017-0658-9

M3 - Article

AN - SCOPUS:85018980233

VL - 57

SP - 927

EP - 941

JO - BIT

JF - BIT

SN - 0006-3835

IS - 3

ER -