TY - JOUR
T1 - Local memory mapping of multicore processors on an automatic parallelizing compiler
AU - Oki, Yoshitake
AU - Abe, Yuto
AU - Yamamoto, Kazuki
AU - Yamamoto, Kohei
AU - Shirakawa, Tomoya
AU - Yoshida, Akimasa
AU - Kimura, Keiji
AU - Kasahara, Hironori
N1 - Funding Information:
The Renesas RP2 multicore processor is an embedded processor based on the OSCAR multicore architecture [19]. The chip was developed by Renesas Electronics, Hitachi, and Waseda University and was supported by NEDO Multicore Processors as a real-time consumer electronics project. An overview of the RP2 architecture is shown in Fig. 9. The RP2 processor has two SMP clusters, each with 4 SH4A cores running at 600MHz. Each processor core has its own private LM. The access latency of this LM is 1 clock cycle. Each processor core can also access a common processor-wide 128MB off-chip centralized shared memory (CSM), which has 55-clock cycle latency.
Publisher Copyright:
© 2020 The Institute of Electronics, Information and Communication Engineers
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multidimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.
AB - Utilization of local memory from real-time embedded systems to high performance systems with multi-core processors has become an important factor for satisfying hard deadline constraints. However, challenges lie in the area of efficiently managing the memory hierarchy, such as decomposing large data into small blocks to fit onto local memory and transferring blocks for reuse and replacement. To address this issue, this paper presents a compiler optimization method that automatically manage local memory of multi-core processors. The method selects and maps multidimensional data onto software specified memory blocks called Adjustable Blocks. These blocks are hierarchically divisible with varying sizes defined by the features of the input application. Moreover, the method introduces mapping structures called Template Arrays to maintain the indices of the decomposed multi-dimensional data. The proposed work is implemented on the OSCAR automatic parallelizing compiler and evaluations were performed on the Renesas RP2 8-core processor. Experimental results from NAS Parallel Benchmark, SPEC benchmark, and multimedia applications show the effectiveness of the method, obtaining maximum speed-ups of 20.44 with 8 cores utilizing local memory from single core sequential versions that use off-chip memory.
KW - Data decomposition
KW - Global address space
KW - Local memory management
KW - Multicore processor
KW - Parallelization compiler
UR - http://www.scopus.com/inward/record.url?scp=85081976134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081976134&partnerID=8YFLogxK
U2 - 10.1587/transele.2019LHP0010
DO - 10.1587/transele.2019LHP0010
M3 - Article
AN - SCOPUS:85081976134
SN - 0916-8524
VL - E103.C
SP - 98
EP - 109
JO - IEICE Transactions on Electronics
JF - IEICE Transactions on Electronics
IS - 3
ER -