Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

Kazuhisa Ishizaka, Motoki Obata, Hironori Kasahara

研究成果: Chapter

3 被引用数 (Scopus)

抄録

In multiprocessor systems, the gap between peak and effective performance has getting larger. To cope with this performance gap, it is important to use multigrain parallelism in addition to ordinary loop level parallelism. Also, effective use of memory hierarchy is important for the performance improvement of multiprocessor systems because the speed gap between processors and memories is getting larger. This paper describes coarse grain task parallel processing that uses parallelism among macro-tasks like loops and subroutines considering cache optimization using data localization scheme. The proposed scheme is implemented on OSCAR automatic multigrain parallelizing compiler. OSCAR compiler generates OpenMP FORTRAN program realizing the proposed scheme from a sequential FORTRAN77 program. Its performance is evaluated on IBM RS6000 SP 604e High Node 8 processors SMP machine using SPEC95fp tomcatv, swim, mgrid. In the evaluation, the proposed coarse grain task parallel processing scheme with cache optimization gives us up to 1.3 times speedup on IPE, 4.7 times speedup on 4PE and 8.8 times speedup on 8PE compared with a sequential processing time.

本文言語English
ホスト出版物のタイトルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
編集者Henry G. Dietz
出版社Springer Verlag
ページ352-365
ページ数14
ISBN(印刷版)3540040293
DOI
出版ステータスPublished - 2003
外部発表はい

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2624
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Coarse grain task parallel processing with cache optimization on shared memory multiprocessor」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル