Multicore Cache Coherence Control by a Parallelizing Compiler

Hironori Kasahara, Keiji Kimura, Boma A. Adhi, Yuhei Hosokawa, Yohei Kishimoto, Masayoshi Mase

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.

    Original languageEnglish
    Title of host publicationProceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017
    PublisherIEEE Computer Society
    Pages492-497
    Number of pages6
    Volume1
    ISBN (Electronic)9781538603673
    DOIs
    Publication statusPublished - 2017 Sep 7
    Event41st IEEE Annual Computer Software and Applications Conference, COMPSAC 2017 - Torino, Italy
    Duration: 2017 Jul 42017 Jul 8

    Other

    Other41st IEEE Annual Computer Software and Applications Conference, COMPSAC 2017
    CountryItaly
    CityTorino
    Period17/7/417/7/8

    Fingerprint

    Hardware
    Computer hardware
    Computer systems
    Data storage equipment

    Keywords

    • Cache
    • Multicore
    • Parallelizing Compiler
    • Shared Memory
    • Software Coherence Control

    ASJC Scopus subject areas

    • Software
    • Computer Science Applications

    Cite this

    Kasahara, H., Kimura, K., Adhi, B. A., Hosokawa, Y., Kishimoto, Y., & Mase, M. (2017). Multicore Cache Coherence Control by a Parallelizing Compiler. In Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017 (Vol. 1, pp. 492-497). [8029648] IEEE Computer Society. https://doi.org/10.1109/COMPSAC.2017.174

    Multicore Cache Coherence Control by a Parallelizing Compiler. / Kasahara, Hironori; Kimura, Keiji; Adhi, Boma A.; Hosokawa, Yuhei; Kishimoto, Yohei; Mase, Masayoshi.

    Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017. Vol. 1 IEEE Computer Society, 2017. p. 492-497 8029648.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kasahara, H, Kimura, K, Adhi, BA, Hosokawa, Y, Kishimoto, Y & Mase, M 2017, Multicore Cache Coherence Control by a Parallelizing Compiler. in Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017. vol. 1, 8029648, IEEE Computer Society, pp. 492-497, 41st IEEE Annual Computer Software and Applications Conference, COMPSAC 2017, Torino, Italy, 17/7/4. https://doi.org/10.1109/COMPSAC.2017.174
    Kasahara H, Kimura K, Adhi BA, Hosokawa Y, Kishimoto Y, Mase M. Multicore Cache Coherence Control by a Parallelizing Compiler. In Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017. Vol. 1. IEEE Computer Society. 2017. p. 492-497. 8029648 https://doi.org/10.1109/COMPSAC.2017.174
    Kasahara, Hironori ; Kimura, Keiji ; Adhi, Boma A. ; Hosokawa, Yuhei ; Kishimoto, Yohei ; Mase, Masayoshi. / Multicore Cache Coherence Control by a Parallelizing Compiler. Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017. Vol. 1 IEEE Computer Society, 2017. pp. 492-497
    @inproceedings{10e5123e0fc74c1aab16925e9af011a6,
    title = "Multicore Cache Coherence Control by a Parallelizing Compiler",
    abstract = "A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.",
    keywords = "Cache, Multicore, Parallelizing Compiler, Shared Memory, Software Coherence Control",
    author = "Hironori Kasahara and Keiji Kimura and Adhi, {Boma A.} and Yuhei Hosokawa and Yohei Kishimoto and Masayoshi Mase",
    year = "2017",
    month = "9",
    day = "7",
    doi = "10.1109/COMPSAC.2017.174",
    language = "English",
    volume = "1",
    pages = "492--497",
    booktitle = "Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017",
    publisher = "IEEE Computer Society",
    address = "United States",

    }

    TY - GEN

    T1 - Multicore Cache Coherence Control by a Parallelizing Compiler

    AU - Kasahara, Hironori

    AU - Kimura, Keiji

    AU - Adhi, Boma A.

    AU - Hosokawa, Yuhei

    AU - Kishimoto, Yohei

    AU - Mase, Masayoshi

    PY - 2017/9/7

    Y1 - 2017/9/7

    N2 - A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.

    AB - A recent development in multicore technology has enabled development of hundreds or thousands core processor. However, on such multicore processor, an efficient hardware cache coherence scheme will become very complex and expensive to develop. This paper proposes a parallelizing compiler directed software coherence scheme for shared memory multicore systems without hardware cache coherence control. The general idea of the proposed method is that an automatic parallelizing compiler analyzes the control dependency and data dependency among coarse grain task in the program. Then based on the obtained information, task parallelization, false sharing detection and data restructuration to prevent false sharing are performed. Next the compiler inserts cache control code to handle stale data problem. The proposed method is built on OSCAR automatic parallelizing compiler and evaluated on Renesas RP2 with 8 SH-4A cores processor. The hardware cache coherence scheme on the RP2 processor is only available for up to 4 cores and the hardware cache coherence can be completely turned off for non-coherence cache mode. Performance evaluation is performed using 10 benchmark program from SPEC2000, SPEC2006, NAS Parallel Benchmark (NPB) and Mediabench II. The proposed method performs as good as or better than hardware cache coherence scheme. For example, 4 cores with the hardware coherence mechanism gave us speed up of 2.52 times against 1 core for SPEC2000 'equake', 2.9 times for SPEC2006 'lbm', 3.34 times for NPB 'cg', and 3.17 times for MediaBench II MPEG2 Encoder. The proposed software cache coherence control gave us 2.63 times for 4 cores and 4.37 for 8 cores for 'equake', 3.28 times for 4 cores and 4.76 times for 8 cores for lbm, 3.71 times for 4 cores and 4.92 times for 8 cores for 'MPEG2 Encoder'.

    KW - Cache

    KW - Multicore

    KW - Parallelizing Compiler

    KW - Shared Memory

    KW - Software Coherence Control

    UR - http://www.scopus.com/inward/record.url?scp=85031909144&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85031909144&partnerID=8YFLogxK

    U2 - 10.1109/COMPSAC.2017.174

    DO - 10.1109/COMPSAC.2017.174

    M3 - Conference contribution

    VL - 1

    SP - 492

    EP - 497

    BT - Proceedings - 2017 IEEE 41st Annual Computer Software and Applications Conference, COMPSAC 2017

    PB - IEEE Computer Society

    ER -