A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

Hironori Kasahara, Akimasa Yoshida

    Research output: Contribution to journalArticle

    10 Citations (Scopus)

    Abstract

    This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts: (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localizable groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data-localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataflow compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 20%, in average, compared with macro-dataflow processing without data localization.

    Original languageEnglish
    Pages (from-to)579-596
    Number of pages18
    JournalParallel Computing
    Volume24
    Issue number3-4
    Publication statusPublished - 1998 May

    Fingerprint

    Task Assignment
    Compilation
    Parallel Processing
    Partial
    Data storage equipment
    Processing
    Data Flow
    Data Transfer
    Macros
    Shared Memory
    Data transfer
    Dynamic Scheduling
    Multiprocessor Systems
    Parallel Machines
    Multiprocessor
    Scheduling
    Distributed Shared Memory
    Decompose
    Data Dependence
    Code Generation

    Keywords

    • Automatic data distribution
    • Coarse-grain parallel processing
    • Data localization
    • Dynamic scheduling
    • Parallelizing compilers

    ASJC Scopus subject areas

    • Computer Science Applications
    • Hardware and Architecture
    • Control and Systems Engineering

    Cite this

    A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing. / Kasahara, Hironori; Yoshida, Akimasa.

    In: Parallel Computing, Vol. 24, No. 3-4, 05.1998, p. 579-596.

    Research output: Contribution to journalArticle

    @article{1319dcb01bd74ecba0bf847338c57b35,
    title = "A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing",
    abstract = "This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts: (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localizable groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data-localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataflow compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 20{\%}, in average, compared with macro-dataflow processing without data localization.",
    keywords = "Automatic data distribution, Coarse-grain parallel processing, Data localization, Dynamic scheduling, Parallelizing compilers",
    author = "Hironori Kasahara and Akimasa Yoshida",
    year = "1998",
    month = "5",
    language = "English",
    volume = "24",
    pages = "579--596",
    journal = "Parallel Computing",
    issn = "0167-8191",
    publisher = "Elsevier",
    number = "3-4",

    }

    TY - JOUR

    T1 - A data-localization compilation scheme using partial-static task assignment for Fortran coarse-grain parallel processing

    AU - Kasahara, Hironori

    AU - Yoshida, Akimasa

    PY - 1998/5

    Y1 - 1998/5

    N2 - This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts: (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localizable groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data-localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataflow compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 20%, in average, compared with macro-dataflow processing without data localization.

    AB - This paper proposes a compilation scheme for data localization using partial-static task assignment for Fortran coarse-grain parallel processing, or macro-dataflow processing, on a multiprocessor system with local memories and centralized shared memory. The data localization allows us to effectively use local memories and reduce data transfer overhead under dynamic task-scheduling environment. The proposed compilation scheme mainly consists of the following three parts: (1) loop-aligned decomposition, which decomposes each of the loops having data dependence among them into smaller loops, and groups the decomposed loops into data-localizable groups so that shared data among the decomposed loops inside each group can be passed via local memory and data transfer overhead among the groups can be minimum; (2) partial static task assignment, which gives information that the decomposed loops inside each data-localizable group are assigned to the same processor to a dynamic scheduling routine generator in the macro-dataflow compiler; (3) parallel machine code generation, which generates parallel machine code to pass shared data inside the group through local memory and transfer data among groups through centralized shared memory. This compilation scheme has been implemented for a multiprocessor system, OSCAR (Optimally SCheduled Advanced multiprocessoR), having centralized shared memory and distributed shared memory, in addition to local memory on each processor. Performance evaluation of OSCAR shows that macro-dataflow processing with the proposed data-localization scheme can reduce the execution time by 20%, in average, compared with macro-dataflow processing without data localization.

    KW - Automatic data distribution

    KW - Coarse-grain parallel processing

    KW - Data localization

    KW - Dynamic scheduling

    KW - Parallelizing compilers

    UR - http://www.scopus.com/inward/record.url?scp=0032064781&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0032064781&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:0032064781

    VL - 24

    SP - 579

    EP - 596

    JO - Parallel Computing

    JF - Parallel Computing

    SN - 0167-8191

    IS - 3-4

    ER -