Compiler software coherent control for embedded high performance multicore

Boma A. Adhi*, Tomoya Kashimata, Ken Takahashi, Keiji Kimura, Hironori Kasahara

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


The advancement of multicore technology has made hundreds or even thousands of cores processor on a single chip possible. However, on a larger scale multicore, a hardware-based cache coherency mechanism becomes overwhelmingly complicated, hot, and expensive. Therefore, we propose a software coherence scheme managed by a parallelizing compiler for shared-memory multicore systems without a hardware cache coherence mechanism. Our proposed method is simple and efficient. It is built into OSCAR automatic parallelizing compiler. The OSCAR compiler parallelizes the coarse grain task, analyzes stale data and line sharing in the program, then solves those problems by simple program restructuring and data synchronization. Using our proposed method, we compiled 10 benchmark programs from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB), and MediaBench II. The compiled binaries then are run on Renesas RP2, an 8 cores SH-4A processor, and a custom 8-core Altera Nios II system on Altera Arria 10 FPGA. The cache coherence hardware on the RP2 processor is only available for up to 4 cores. The RP2's cache coherence hardware can also be turned off for non-coherence cache mode. The Nios II multicore system does not have any hardware cache coherence mechanism; therefore, running a parallel program is difficult without any compiler support. The proposed method performed as good as or better than the hardware cache coherence scheme while still provided the correct result as the hardware coherence mechanism. This method allows a massive array of shared memory CPU cores in an HPC setting or a simple non-coherent multicore embedded CPU to be easily programmed. For example, on the RP2 processor, the proposed software-controlled non-coherent-cache (NCC) method gave us 2.6 times speedup for SPEC 2000 “equake” with 4 cores against sequential execution while got only 2.5 times speedup for 4 cores MESI hardware coherent control. Also, the software coherence control gave us 4.4 times speedup for 8 cores with no hardware coherence mechanism available.

Original languageEnglish
Pages (from-to)85-97
Number of pages13
JournalIEICE Transactions on Electronics
Issue number3
Publication statusPublished - 2020 Mar 1


  • Cache
  • Multicore
  • Parallelizing compiler
  • Shared memory
  • Soft core
  • Software coherence control

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering


Dive into the research topics of 'Compiler software coherent control for embedded high performance multicore'. Together they form a unique fingerprint.

Cite this