In this article, we present an efficient algorithm to compute the faithful rounding of the l2-norm of a floatingpoint vector. This means that the result is accurate to within 1 bit of the underlying floating-point type. This algorithm does not generate overflows or underflows spuriously, but does so when the final result calls for such a numerical exception to be raised. Moreover, the algorithm is well suited for parallel implementation and vectorization. The implementation runs up to 3 times faster than the netlib version on current processors.
ASJC Scopus subject areas