Cache optimization for coarse grain task parallel processing using inter-array padding

Kazuhisa Ishizaka*, Motoki Obata, Hironori Kasahara

*この研究の対応する著者

研究成果: Article査読

7 被引用数 (Scopus)

抄録

The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements are used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor. In the performance evaluation on Sun Ultra 80(4pe), OSCAR compiler on which the proposed scheme is implemented gave us 2.5 times speedup against the maximum performance of Sun Forte compiler automatic loop parallelization at the average of SPEC CFP95 tomcatv, swim hydro2d and turb3d programs. Also, OSCAR compiler showed 2.1 times speedup on IBM RS/6000 44p-270(4pe) against XLF compiler.

本文言語English
ページ(範囲)64-76
ページ数13
ジャーナルLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
2958
出版ステータスPublished - 2004 12 1

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「Cache optimization for coarse grain task parallel processing using inter-array padding」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル