Accurate floating-point summation part I

Faithful rounding

Siegfried M. Rump, Takeshi Ogita, Shinichi Oishi

    Research output: Contribution to journalArticle

    101 Citations (Scopus)

    Abstract

    Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

    Original languageEnglish
    Pages (from-to)189-224
    Number of pages36
    JournalSIAM Journal on Scientific Computing
    Volume31
    Issue number1
    DOIs
    Publication statusPublished - 2008

    Fingerprint

    Rounding
    Floating point
    Faithful
    Summation
    Condition number
    Mantissa
    Exponent
    Computing
    Subtraction
    Logarithm
    Parallelism
    Multiplication
    Branch
    Directly proportional
    Range of data

    Keywords

    • Distillation
    • Error analysis
    • Error-free transformation
    • Extended and mixed precision basic linear algebra subprograms
    • Faithful founding
    • High accuracy
    • Maximally accurate summation
    • XBLAS

    ASJC Scopus subject areas

    • Applied Mathematics
    • Computational Mathematics

    Cite this

    Accurate floating-point summation part I : Faithful rounding. / Rump, Siegfried M.; Ogita, Takeshi; Oishi, Shinichi.

    In: SIAM Journal on Scientific Computing, Vol. 31, No. 1, 2008, p. 189-224.

    Research output: Contribution to journalArticle

    Rump, Siegfried M. ; Ogita, Takeshi ; Oishi, Shinichi. / Accurate floating-point summation part I : Faithful rounding. In: SIAM Journal on Scientific Computing. 2008 ; Vol. 31, No. 1. pp. 189-224.
    @article{247f6b38e27f4708b235afbb1332b42b,
    title = "Accurate floating-point summation part I: Faithful rounding",
    abstract = "Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.",
    keywords = "Distillation, Error analysis, Error-free transformation, Extended and mixed precision basic linear algebra subprograms, Faithful founding, High accuracy, Maximally accurate summation, XBLAS",
    author = "Rump, {Siegfried M.} and Takeshi Ogita and Shinichi Oishi",
    year = "2008",
    doi = "10.1137/050645671",
    language = "English",
    volume = "31",
    pages = "189--224",
    journal = "SIAM Journal of Scientific Computing",
    issn = "1064-8275",
    publisher = "Society for Industrial and Applied Mathematics Publications",
    number = "1",

    }

    TY - JOUR

    T1 - Accurate floating-point summation part I

    T2 - Faithful rounding

    AU - Rump, Siegfried M.

    AU - Ogita, Takeshi

    AU - Oishi, Shinichi

    PY - 2008

    Y1 - 2008

    N2 - Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

    AB - Given a vector of floating-point numbers with exact sum s, we present an algorithm for calculating a faithful rounding of s, i.e., the result is one of the immediate floating-point neighbors of s. If the sum a is a floating-point number, we prove that this is the result of our algorithm. The algorithm adapts to the condition number of the sum, i.e., it is fast for mildly conditioned sums with slowly increasing computing time proportional to the logarithm of the condition number. All statements are also true in the presence of underflow. The algorithm does not depend on the exponent range. Our algorithm is fast in terms of measured computing time because it allows good instructionlevel parallelism, it neither requires special operations such as access to mantissa or exponent, it contains no branch in the inner loop, nor does it require some extra precision: The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain constants used in the algorithm are proved to be optimal.

    KW - Distillation

    KW - Error analysis

    KW - Error-free transformation

    KW - Extended and mixed precision basic linear algebra subprograms

    KW - Faithful founding

    KW - High accuracy

    KW - Maximally accurate summation

    KW - XBLAS

    UR - http://www.scopus.com/inward/record.url?scp=55049129860&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=55049129860&partnerID=8YFLogxK

    U2 - 10.1137/050645671

    DO - 10.1137/050645671

    M3 - Article

    VL - 31

    SP - 189

    EP - 224

    JO - SIAM Journal of Scientific Computing

    JF - SIAM Journal of Scientific Computing

    SN - 1064-8275

    IS - 1

    ER -