Fine-grain multithreading with the EM-X multiprocessor

Andrew Sohn, Yuetsu Kodama, Jui Ku, Mitsuhisa Sato, Hirofumi Sakane, Hayato Yamana, Shuichi Sakai, Yoshinori Yamaguchi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FFT yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 35% of the communication time for bitonic sorting.

Original languageEnglish
Title of host publicationAnnual ACM Symposium on Parallel Algorithms and Architectures
Editors Anon
Place of PublicationNew York, NY, United States
PublisherACM
Pages189-198
Number of pages10
Publication statusPublished - 1997
Externally publishedYes
EventProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA - Newport, RI, USA
Duration: 1997 Jun 221997 Jun 25

Other

OtherProceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA
CityNewport, RI, USA
Period97/6/2297/6/25

Fingerprint

Communication
Sorting
Fast Fourier transforms
Hardware
Scheduling
Switches
Data storage equipment
Experiments

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality

Cite this

Sohn, A., Kodama, Y., Ku, J., Sato, M., Sakane, H., Yamana, H., ... Yamaguchi, Y. (1997). Fine-grain multithreading with the EM-X multiprocessor. In Anon (Ed.), Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 189-198). New York, NY, United States: ACM.

Fine-grain multithreading with the EM-X multiprocessor. / Sohn, Andrew; Kodama, Yuetsu; Ku, Jui; Sato, Mitsuhisa; Sakane, Hirofumi; Yamana, Hayato; Sakai, Shuichi; Yamaguchi, Yoshinori.

Annual ACM Symposium on Parallel Algorithms and Architectures. ed. / Anon. New York, NY, United States : ACM, 1997. p. 189-198.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sohn, A, Kodama, Y, Ku, J, Sato, M, Sakane, H, Yamana, H, Sakai, S & Yamaguchi, Y 1997, Fine-grain multithreading with the EM-X multiprocessor. in Anon (ed.), Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, New York, NY, United States, pp. 189-198, Proceedings of the 1997 9th Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, Newport, RI, USA, 97/6/22.
Sohn A, Kodama Y, Ku J, Sato M, Sakane H, Yamana H et al. Fine-grain multithreading with the EM-X multiprocessor. In Anon, editor, Annual ACM Symposium on Parallel Algorithms and Architectures. New York, NY, United States: ACM. 1997. p. 189-198
Sohn, Andrew ; Kodama, Yuetsu ; Ku, Jui ; Sato, Mitsuhisa ; Sakane, Hirofumi ; Yamana, Hayato ; Sakai, Shuichi ; Yamaguchi, Yoshinori. / Fine-grain multithreading with the EM-X multiprocessor. Annual ACM Symposium on Parallel Algorithms and Architectures. editor / Anon. New York, NY, United States : ACM, 1997. pp. 189-198
@inproceedings{5bdd4396463148948dc829840d64f571,
title = "Fine-grain multithreading with the EM-X multiprocessor",
abstract = "Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FFT yielded over 95{\%} overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 35{\%} of the communication time for bitonic sorting.",
author = "Andrew Sohn and Yuetsu Kodama and Jui Ku and Mitsuhisa Sato and Hirofumi Sakane and Hayato Yamana and Shuichi Sakai and Yoshinori Yamaguchi",
year = "1997",
language = "English",
pages = "189--198",
editor = "Anon",
booktitle = "Annual ACM Symposium on Parallel Algorithms and Architectures",
publisher = "ACM",

}

TY - GEN

T1 - Fine-grain multithreading with the EM-X multiprocessor

AU - Sohn, Andrew

AU - Kodama, Yuetsu

AU - Ku, Jui

AU - Sato, Mitsuhisa

AU - Sakane, Hirofumi

AU - Yamana, Hayato

AU - Sakai, Shuichi

AU - Yamaguchi, Yoshinori

PY - 1997

Y1 - 1997

N2 - Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FFT yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 35% of the communication time for bitonic sorting.

AB - Multithreading aims to tolerate latency by overlapping communication with computation. This report explicates the multithreading capabilities of the EM-X distributed-memory multiprocessor through empirical studies. The EM-X provides hardware supports for fine-grain multithreading, including a by-passing mechanism for direct remote reads and writes, hardware FIFO thread scheduling, and dedicated instructions for generating fixed-sized communication packets. Bitonic sorting and Fast Fourier Transform are selected for experiments. Parameters that characterize the performance of multithreading are investigated, including the number of threads, the number of thread switches, the run length, and the number of remote reads. Experimental results indicate that the best communication performance occurs when the number of threads is two to four. FFT yielded over 95% overlapping due to a large amount of computation and communication parallelism across threads. Even in the absence of thread computation parallelism, multithreading helps overlap over 35% of the communication time for bitonic sorting.

UR - http://www.scopus.com/inward/record.url?scp=0030661014&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030661014&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0030661014

SP - 189

EP - 198

BT - Annual ACM Symposium on Parallel Algorithms and Architectures

A2 - Anon, null

PB - ACM

CY - New York, NY, United States

ER -