TY - JOUR
T1 - Experience with fine-grain communication in EM-X multiprocessor for parallel sparse matrix computation
AU - Sato, Mitsuhisa
AU - Kodama, Yuetsu
AU - Sakane, Hirofumi
AU - Yamana, Hayato
AU - Sakai, Shuichi
AU - Yamaguchi, Yoshinori
PY - 1997/1/1
Y1 - 1997/1/1
N2 - Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this paper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix kernel, Conjugate Gradient, is selected for the experiments. Among the steps in CG is the sparse matrix vector multiplications we focus on in the study. Some communication methods are developed for performance comparison, including coarse-grain and fine-grain implementations. Fine-grain communication allows exact data access in an unstructured problem to reduce the amount of communication. While CG presents bottlenecks in terms of a large number of fine-grain remote reads, the multithreaded principles of execution is so designed to tolerate such latency. Experimental results indicate that the performance of fine-grain read implementation is comparable to that of coarse-grain implementation on 64 processors. The results demonstrate that fine-grain communication can be a viable and efficient approach to unstructured sparse matrix problems on large-scale distributed-memory multiprocessors.
AB - Sparse matrix problems require a communication paradigm different from those used in conventional distributed-memory multiprocessors. We present in this paper how fine-grain communication can help obtain high performance in the experimental distributed-memory multiprocessor, EM-X, developed at ETL, which can handle fine-grain communication very efficiently. The sparse matrix kernel, Conjugate Gradient, is selected for the experiments. Among the steps in CG is the sparse matrix vector multiplications we focus on in the study. Some communication methods are developed for performance comparison, including coarse-grain and fine-grain implementations. Fine-grain communication allows exact data access in an unstructured problem to reduce the amount of communication. While CG presents bottlenecks in terms of a large number of fine-grain remote reads, the multithreaded principles of execution is so designed to tolerate such latency. Experimental results indicate that the performance of fine-grain read implementation is comparable to that of coarse-grain implementation on 64 processors. The results demonstrate that fine-grain communication can be a viable and efficient approach to unstructured sparse matrix problems on large-scale distributed-memory multiprocessors.
UR - http://www.scopus.com/inward/record.url?scp=0030646771&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0030646771&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:0030646771
SN - 1063-7133
SP - 242
EP - 248
JO - Proceedings of the International Parallel Processing Symposium, IPPS
JF - Proceedings of the International Parallel Processing Symposium, IPPS
T2 - Proceedings of the 1997 11th International Parallel Processing Symposium, IPPS 97
Y2 - 1 April 1997 through 5 April 1997
ER -