Automatic local memory management for multicores having global address space

Kouhei Yamamoto, Tomoya Shirakawa, Yoshitake Oki, Akimasa Yoshida, Keiji Kimura, Hironori Kasahara

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multigrain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

    Original languageEnglish
    Title of host publicationLanguages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers
    PublisherSpringer Verlag
    Pages282-296
    Number of pages15
    Volume10136 LNCS
    ISBN (Print)9783319527086
    DOIs
    Publication statusPublished - 2017
    Event29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016 - Rochester, United States
    Duration: 2016 Sep 282016 Sep 30

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume10136 LNCS
    ISSN (Print)03029743
    ISSN (Electronic)16113349

    Other

    Other29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016
    CountryUnited States
    CityRochester
    Period16/9/2816/9/30

    Fingerprint

    Memory Management
    Data storage equipment
    Multi-core Processor
    Compiler
    Multidimensional Data
    Deadline
    Encoder
    Shared Memory
    Engine Control
    MPEG-2
    Real-time
    Replacement Policy
    Automobile engines
    Liveness
    Cache memory
    Decomposition Techniques
    Evaluation
    Data Transfer
    Automobile
    Data transfer

    Keywords

    • Data decomposition
    • DMA
    • Global address space
    • Local memory management
    • Multicore
    • Parallelizing compiler

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Yamamoto, K., Shirakawa, T., Oki, Y., Yoshida, A., Kimura, K., & Kasahara, H. (2017). Automatic local memory management for multicores having global address space. In Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers (Vol. 10136 LNCS, pp. 282-296). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10136 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-52709-3_21

    Automatic local memory management for multicores having global address space. / Yamamoto, Kouhei; Shirakawa, Tomoya; Oki, Yoshitake; Yoshida, Akimasa; Kimura, Keiji; Kasahara, Hironori.

    Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers. Vol. 10136 LNCS Springer Verlag, 2017. p. 282-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10136 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yamamoto, K, Shirakawa, T, Oki, Y, Yoshida, A, Kimura, K & Kasahara, H 2017, Automatic local memory management for multicores having global address space. in Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers. vol. 10136 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10136 LNCS, Springer Verlag, pp. 282-296, 29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016, Rochester, United States, 16/9/28. https://doi.org/10.1007/978-3-319-52709-3_21
    Yamamoto K, Shirakawa T, Oki Y, Yoshida A, Kimura K, Kasahara H. Automatic local memory management for multicores having global address space. In Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers. Vol. 10136 LNCS. Springer Verlag. 2017. p. 282-296. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-52709-3_21
    Yamamoto, Kouhei ; Shirakawa, Tomoya ; Oki, Yoshitake ; Yoshida, Akimasa ; Kimura, Keiji ; Kasahara, Hironori. / Automatic local memory management for multicores having global address space. Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers. Vol. 10136 LNCS Springer Verlag, 2017. pp. 282-296 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{18f8e3d99f014ab79ad08ce67503072f,
    title = "Automatic local memory management for multicores having global address space",
    abstract = "Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multigrain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.",
    keywords = "Data decomposition, DMA, Global address space, Local memory management, Multicore, Parallelizing compiler",
    author = "Kouhei Yamamoto and Tomoya Shirakawa and Yoshitake Oki and Akimasa Yoshida and Keiji Kimura and Hironori Kasahara",
    year = "2017",
    doi = "10.1007/978-3-319-52709-3_21",
    language = "English",
    isbn = "9783319527086",
    volume = "10136 LNCS",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer Verlag",
    pages = "282--296",
    booktitle = "Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers",
    address = "Germany",

    }

    TY - GEN

    T1 - Automatic local memory management for multicores having global address space

    AU - Yamamoto, Kouhei

    AU - Shirakawa, Tomoya

    AU - Oki, Yoshitake

    AU - Yoshida, Akimasa

    AU - Kimura, Keiji

    AU - Kasahara, Hironori

    PY - 2017

    Y1 - 2017

    N2 - Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multigrain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

    AB - Embedded multicore processors for hard real-time applications like automobile engine control require the usage of local memory on each processor core to precisely meet the real-time deadline constraints, since cache memory cannot satisfy the deadline requirements due to cache misses. To utilize local memory, programmers or compilers need to explicitly manage data movement and data replacement for local memory considering the limited size. However, such management is extremely difficult and time consuming for programmers. This paper proposes an automatic local memory management method by compilers through (i) multi-dimensional data decomposition techniques to fit working sets onto limited size local memory (ii) suitable block management structures, called Adjustable Blocks, to create application specific fixed size data transfer blocks (iii) multi-dimensional templates to preserve the original multi-dimensional representations of the decomposed multi-dimensional data that are mapped onto one-dimensional Adjustable Blocks (iv) block replacement policies from liveness analysis of the decomposed data, and (v) code size reduction schemes to generate shorter codes. The proposed local memory management method is implemented on the OSCAR multigrain and multi-platform compiler and evaluated on the Renesas RP2 8 core embedded homogeneous multicore processor equipped with local and shared memory. Evaluations on 5 programs including multimedia and scientific applications show promising results. For instance, speedups on 8 cores compared to single core execution using off-chip shared memory on an AAC encoder program, a MPEG2 encoder program, Tomcatv, and Swim are improved from 7.14 to 20.12, 1.97 to 7.59, 5.73 to 7.38, and 7.40 to 11.30, respectively, when using local memory with the proposed method. These evaluations indicate the usefulness and the validity of the proposed local memory management method on real embedded multicore processors.

    KW - Data decomposition

    KW - DMA

    KW - Global address space

    KW - Local memory management

    KW - Multicore

    KW - Parallelizing compiler

    UR - http://www.scopus.com/inward/record.url?scp=85011391983&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85011391983&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-52709-3_21

    DO - 10.1007/978-3-319-52709-3_21

    M3 - Conference contribution

    SN - 9783319527086

    VL - 10136 LNCS

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 282

    EP - 296

    BT - Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers

    PB - Springer Verlag

    ER -