Accurate floating-point summation part II

Sign. Κ-Fold faithful and rounding to nearest

Siegfried M. Rumpt, Takeshi Ogita, Shinichi Oishi

    Research output: Contribution to journalArticle

    42 Citations (Scopus)

    Abstract

    In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

    Original languageEnglish
    Pages (from-to)1269-1302
    Number of pages34
    JournalSIAM Journal on Scientific Computing
    Volume31
    Issue number2
    DOIs
    Publication statusPublished - 2008

    Fingerprint

    Rounding
    Floating point
    Faithful
    Summation
    Fold
    Pi
    Mantissa
    Instruction Level Parallelism
    Computing
    Subtraction
    Upper and Lower Bounds
    Multiplication
    Branch
    Exponent
    Unit

    Keywords

    • Κ-fold accuracy
    • Directed rounding
    • Distillation
    • Error analysis
    • Error-free transformations
    • Faithful rounding
    • High accuracy
    • Maximally accurate summation
    • Rounding to nearest
    • Sign
    • XBLAS

    ASJC Scopus subject areas

    • Applied Mathematics
    • Computational Mathematics

    Cite this

    Accurate floating-point summation part II : Sign. Κ-Fold faithful and rounding to nearest. / Rumpt, Siegfried M.; Ogita, Takeshi; Oishi, Shinichi.

    In: SIAM Journal on Scientific Computing, Vol. 31, No. 2, 2008, p. 1269-1302.

    Research output: Contribution to journalArticle

    @article{2bc3b61adf664eb1a0d93c07d4553643,
    title = "Accurate floating-point summation part II: Sign. Κ-Fold faithful and rounding to nearest",
    abstract = "In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.",
    keywords = "Κ-fold accuracy, Directed rounding, Distillation, Error analysis, Error-free transformations, Faithful rounding, High accuracy, Maximally accurate summation, Rounding to nearest, Sign, XBLAS",
    author = "Rumpt, {Siegfried M.} and Takeshi Ogita and Shinichi Oishi",
    year = "2008",
    doi = "10.1137/07068816X",
    language = "English",
    volume = "31",
    pages = "1269--1302",
    journal = "SIAM Journal of Scientific Computing",
    issn = "1064-8275",
    publisher = "Society for Industrial and Applied Mathematics Publications",
    number = "2",

    }

    TY - JOUR

    T1 - Accurate floating-point summation part II

    T2 - Sign. Κ-Fold faithful and rounding to nearest

    AU - Rumpt, Siegfried M.

    AU - Ogita, Takeshi

    AU - Oishi, Shinichi

    PY - 2008

    Y1 - 2008

    N2 - In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

    AB - In Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm fo r calculating the rounded-to-nearest result of s := σPi for a given vector of floating-point numbers pi. as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions. Assume a floating-point working precision with relative round ing error unit eps. We define and investigate a Κ-fold faithful rounding of a real number r. Basically the result is stored in a vector Resv of Κ nonoverlapping floating-point numbers such that σ Resv approximates r with relative accuracy epsΚ, and replacing Resκ by its floating-point neighbors in Y. Resv forms a lower and upper bound for r. For a given vector of floating-point numbers with exact sums, we present an algorithm for calculating a Κ-fold faithful rounding of s using solely the working precision. Furthermore, an algorithm for calculating a faithfully rounded result of the sum of a vector of huge dimension is presented. Our algorithms are fast in terms of measured computing time because they allow good instruction-level parallelism, they neither require special operations such as access to mantissa or exponent. they contain no branch in the inner loop. nor do they require some extra precision. The only operations used are standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision. Certain c onstants used in the algorithms are proved to be optimal.

    KW - Κ-fold accuracy

    KW - Directed rounding

    KW - Distillation

    KW - Error analysis

    KW - Error-free transformations

    KW - Faithful rounding

    KW - High accuracy

    KW - Maximally accurate summation

    KW - Rounding to nearest

    KW - Sign

    KW - XBLAS

    UR - http://www.scopus.com/inward/record.url?scp=67649610606&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=67649610606&partnerID=8YFLogxK

    U2 - 10.1137/07068816X

    DO - 10.1137/07068816X

    M3 - Article

    VL - 31

    SP - 1269

    EP - 1302

    JO - SIAM Journal of Scientific Computing

    JF - SIAM Journal of Scientific Computing

    SN - 1064-8275

    IS - 2

    ER -