CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System

Takeshi Fukaya, Yuji Nakatsukasa, Yuka Yanagisawa, Yusaku Yamamoto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Designing communication-avoiding algorithms is crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskyQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 × 64 matrix.

Original languageEnglish
Title of host publicationProceedings of ScalA 2014
Subtitle of host publication5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages31-38
Number of pages8
ISBN (Electronic)9781479975624
DOIs
Publication statusPublished - 2014
Event5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2014 - New Orleans, United States
Duration: 2014 Nov 17 → …

Other

Other5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2014
CountryUnited States
CityNew Orleans
Period14/11/17 → …

Fingerprint

Factorization
Communication
Costs
Experiments

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Computer Science Applications
  • Software
  • Electrical and Electronic Engineering

Cite this

Fukaya, T., Nakatsukasa, Y., Yanagisawa, Y., & Yamamoto, Y. (2014). CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System. In Proceedings of ScalA 2014: 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 31-38). [7016731] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ScalA.2014.11

CholeskyQR2 : A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System. / Fukaya, Takeshi; Nakatsukasa, Yuji; Yanagisawa, Yuka; Yamamoto, Yusaku.

Proceedings of ScalA 2014: 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. Institute of Electrical and Electronics Engineers Inc., 2014. p. 31-38 7016731.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fukaya, T, Nakatsukasa, Y, Yanagisawa, Y & Yamamoto, Y 2014, CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System. in Proceedings of ScalA 2014: 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis., 7016731, Institute of Electrical and Electronics Engineers Inc., pp. 31-38, 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, ScalA 2014, New Orleans, United States, 14/11/17. https://doi.org/10.1109/ScalA.2014.11
Fukaya T, Nakatsukasa Y, Yanagisawa Y, Yamamoto Y. CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System. In Proceedings of ScalA 2014: 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. Institute of Electrical and Electronics Engineers Inc. 2014. p. 31-38. 7016731 https://doi.org/10.1109/ScalA.2014.11
Fukaya, Takeshi ; Nakatsukasa, Yuji ; Yanagisawa, Yuka ; Yamamoto, Yusaku. / CholeskyQR2 : A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System. Proceedings of ScalA 2014: 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - held in conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 31-38
@inproceedings{45edb4c5a1b64e9c9b5eae68a4dba34c,
title = "CholeskyQR2: A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System",
abstract = "Designing communication-avoiding algorithms is crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskyQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 × 64 matrix.",
author = "Takeshi Fukaya and Yuji Nakatsukasa and Yuka Yanagisawa and Yusaku Yamamoto",
year = "2014",
doi = "10.1109/ScalA.2014.11",
language = "English",
pages = "31--38",
booktitle = "Proceedings of ScalA 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - CholeskyQR2

T2 - A Simple and Communication-Avoiding Algorithm for Computing a Tall-Skinny QR Factorization on a Large-Scale Parallel System

AU - Fukaya, Takeshi

AU - Nakatsukasa, Yuji

AU - Yanagisawa, Yuka

AU - Yamamoto, Yusaku

PY - 2014

Y1 - 2014

N2 - Designing communication-avoiding algorithms is crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskyQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 × 64 matrix.

AB - Designing communication-avoiding algorithms is crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskyQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 × 64 matrix.

UR - http://www.scopus.com/inward/record.url?scp=84988268780&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84988268780&partnerID=8YFLogxK

U2 - 10.1109/ScalA.2014.11

DO - 10.1109/ScalA.2014.11

M3 - Conference contribution

AN - SCOPUS:84988268780

SP - 31

EP - 38

BT - Proceedings of ScalA 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -