Data-localization for Fortran macro-dataflow computation using partial static task assignment

Akimasa Yoshida, Kenichi Koshizuka, Hironori Kasahara

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    8 Citations (Scopus)

    Abstract

    This paper proposes a data-localization compilation scheme for macro-dataflow computation, in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are automatically processed in parallel on a multiprocessor system. The data-localization scheme reduces data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory effectively. In this scheme, a compiler partitions coarse-grain tasks, or loops, having data dependences among them into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum, generates dynamic scheduling routine with partial static task assignment to assign decomposed tasks in a group to the same processor at run-time, and generates parallel machine code to pass shared data inside the group through local memory. A compiler has been implemented for an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow computation with the proposed data-localization scheme can reduce the execution time by 10% to 20% average compared with ordinary macro-dataflow computation using centralized shared memory.

    Original languageEnglish
    Title of host publicationProceedings of the International Conference on Supercomputing
    Place of PublicationNew York, NY, United States
    PublisherACM
    Pages61-68
    Number of pages8
    Publication statusPublished - 1996
    EventProceedings of the 1996 International Conference on Supercomputing - Philadelphia, PA, USA
    Duration: 1996 May 251996 May 28

    Other

    OtherProceedings of the 1996 International Conference on Supercomputing
    CityPhiladelphia, PA, USA
    Period96/5/2596/5/28

    Fingerprint

    Macros
    Data storage equipment
    Data transfer
    Subroutines
    Scheduling
    Decomposition

    ASJC Scopus subject areas

    • Computer Science(all)

    Cite this

    Yoshida, A., Koshizuka, K., & Kasahara, H. (1996). Data-localization for Fortran macro-dataflow computation using partial static task assignment. In Proceedings of the International Conference on Supercomputing (pp. 61-68). New York, NY, United States: ACM.

    Data-localization for Fortran macro-dataflow computation using partial static task assignment. / Yoshida, Akimasa; Koshizuka, Kenichi; Kasahara, Hironori.

    Proceedings of the International Conference on Supercomputing. New York, NY, United States : ACM, 1996. p. 61-68.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yoshida, A, Koshizuka, K & Kasahara, H 1996, Data-localization for Fortran macro-dataflow computation using partial static task assignment. in Proceedings of the International Conference on Supercomputing. ACM, New York, NY, United States, pp. 61-68, Proceedings of the 1996 International Conference on Supercomputing, Philadelphia, PA, USA, 96/5/25.
    Yoshida A, Koshizuka K, Kasahara H. Data-localization for Fortran macro-dataflow computation using partial static task assignment. In Proceedings of the International Conference on Supercomputing. New York, NY, United States: ACM. 1996. p. 61-68
    Yoshida, Akimasa ; Koshizuka, Kenichi ; Kasahara, Hironori. / Data-localization for Fortran macro-dataflow computation using partial static task assignment. Proceedings of the International Conference on Supercomputing. New York, NY, United States : ACM, 1996. pp. 61-68
    @inproceedings{d541792de9aa4b2084a042fb3a41fa39,
    title = "Data-localization for Fortran macro-dataflow computation using partial static task assignment",
    abstract = "This paper proposes a data-localization compilation scheme for macro-dataflow computation, in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are automatically processed in parallel on a multiprocessor system. The data-localization scheme reduces data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory effectively. In this scheme, a compiler partitions coarse-grain tasks, or loops, having data dependences among them into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum, generates dynamic scheduling routine with partial static task assignment to assign decomposed tasks in a group to the same processor at run-time, and generates parallel machine code to pass shared data inside the group through local memory. A compiler has been implemented for an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow computation with the proposed data-localization scheme can reduce the execution time by 10{\%} to 20{\%} average compared with ordinary macro-dataflow computation using centralized shared memory.",
    author = "Akimasa Yoshida and Kenichi Koshizuka and Hironori Kasahara",
    year = "1996",
    language = "English",
    pages = "61--68",
    booktitle = "Proceedings of the International Conference on Supercomputing",
    publisher = "ACM",

    }

    TY - GEN

    T1 - Data-localization for Fortran macro-dataflow computation using partial static task assignment

    AU - Yoshida, Akimasa

    AU - Koshizuka, Kenichi

    AU - Kasahara, Hironori

    PY - 1996

    Y1 - 1996

    N2 - This paper proposes a data-localization compilation scheme for macro-dataflow computation, in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are automatically processed in parallel on a multiprocessor system. The data-localization scheme reduces data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory effectively. In this scheme, a compiler partitions coarse-grain tasks, or loops, having data dependences among them into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum, generates dynamic scheduling routine with partial static task assignment to assign decomposed tasks in a group to the same processor at run-time, and generates parallel machine code to pass shared data inside the group through local memory. A compiler has been implemented for an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow computation with the proposed data-localization scheme can reduce the execution time by 10% to 20% average compared with ordinary macro-dataflow computation using centralized shared memory.

    AB - This paper proposes a data-localization compilation scheme for macro-dataflow computation, in which coarse-grain tasks such as loops, subroutines and basic blocks in a Fortran program are automatically processed in parallel on a multiprocessor system. The data-localization scheme reduces data transfer overhead for passing shared data among coarse-grain tasks composed of Doall loops and sequential loops by using local memory effectively. In this scheme, a compiler partitions coarse-grain tasks, or loops, having data dependences among them into multiple groups by a loop aligned decomposition so that data transfer among groups can be minimum, generates dynamic scheduling routine with partial static task assignment to assign decomposed tasks in a group to the same processor at run-time, and generates parallel machine code to pass shared data inside the group through local memory. A compiler has been implemented for an actual multiprocessor system OSCAR having centralized shared memory and distributed shared memory in addition to local memory on each processor. Performance evaluation on OSCAR shows that macro-dataflow computation with the proposed data-localization scheme can reduce the execution time by 10% to 20% average compared with ordinary macro-dataflow computation using centralized shared memory.

    UR - http://www.scopus.com/inward/record.url?scp=0029712979&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0029712979&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:0029712979

    SP - 61

    EP - 68

    BT - Proceedings of the International Conference on Supercomputing

    PB - ACM

    CY - New York, NY, United States

    ER -