Communication latency is central to multiprocessor design. This study presents the design principles of the EM-X distributed-memory multiprocessor towards tolerating communication latency. The EM-X overlaps computation with communication for latency tolerance by multithreading. In particular, we present two types of hardware support for remote memory access: (1) priority-based packet scheduling for thread invocation, and (2) direct remote memory access. The prioritybased scheduling policy extends a FIFO ordered thread invocation policy to adopt to different computational needs. The direct remote memory access is designed to overlap remote memory operations with thread execution. The 80-processor prototype of EM-X is developed and is operational since December 1995. We execute several programs on the machine and evaluate how the EM-X effectively overlaps computation with communication toward tolerating communication latency for high performance parallel computing.
|ジャーナル||IEICE Transactions on Information and Systems|
|出版ステータス||Published - 1996 1月 1|
ASJC Scopus subject areas
- コンピュータ ビジョンおよびパターン認識