Static coarse grain task scheduling with cache optimization using openMP

Hirofumi Nakano, Kazuhisa Ishizaka, Motoki Obata, Keiji Kimura, Hironori Kasahara

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.

    Original languageEnglish
    Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Pages479-489
    Number of pages11
    Volume2327 LNCS
    DOIs
    Publication statusPublished - 2002
    Event4th International Symposium on High Performance Computing, ISHPC 2002 - Kansai Science City
    Duration: 2002 May 152002 May 17

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2327 LNCS
    ISSN (Print)03029743
    ISSN (Electronic)16113349

    Other

    Other4th International Symposium on High Performance Computing, ISHPC 2002
    CityKansai Science City
    Period02/5/1502/5/17

    Fingerprint

    OpenMP
    Task Scheduling
    Parallelizing Compilers
    Cache
    Scheduling
    Sun
    Parallelism
    Optimization
    Cache memory
    Thread
    Schedule
    Speedup
    Decomposition
    Iteration
    Data storage equipment
    Decompose

    ASJC Scopus subject areas

    • Computer Science(all)
    • Theoretical Computer Science

    Cite this

    Nakano, H., Ishizaka, K., Obata, M., Kimura, K., & Kasahara, H. (2002). Static coarse grain task scheduling with cache optimization using openMP. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2327 LNCS, pp. 479-489). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2327 LNCS). https://doi.org/10.1007/3-540-47847-7_44

    Static coarse grain task scheduling with cache optimization using openMP. / Nakano, Hirofumi; Ishizaka, Kazuhisa; Obata, Motoki; Kimura, Keiji; Kasahara, Hironori.

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2327 LNCS 2002. p. 479-489 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2327 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Nakano, H, Ishizaka, K, Obata, M, Kimura, K & Kasahara, H 2002, Static coarse grain task scheduling with cache optimization using openMP. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 2327 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2327 LNCS, pp. 479-489, 4th International Symposium on High Performance Computing, ISHPC 2002, Kansai Science City, 02/5/15. https://doi.org/10.1007/3-540-47847-7_44
    Nakano H, Ishizaka K, Obata M, Kimura K, Kasahara H. Static coarse grain task scheduling with cache optimization using openMP. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2327 LNCS. 2002. p. 479-489. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/3-540-47847-7_44
    Nakano, Hirofumi ; Ishizaka, Kazuhisa ; Obata, Motoki ; Kimura, Keiji ; Kasahara, Hironori. / Static coarse grain task scheduling with cache optimization using openMP. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 2327 LNCS 2002. pp. 479-489 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{8e42616926e4483c87d6454bce053cfd,
    title = "Static coarse grain task scheduling with cache optimization using openMP",
    abstract = "Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.",
    author = "Hirofumi Nakano and Kazuhisa Ishizaka and Motoki Obata and Keiji Kimura and Hironori Kasahara",
    year = "2002",
    doi = "10.1007/3-540-47847-7_44",
    language = "English",
    isbn = "354043674X",
    volume = "2327 LNCS",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    pages = "479--489",
    booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

    }

    TY - GEN

    T1 - Static coarse grain task scheduling with cache optimization using openMP

    AU - Nakano, Hirofumi

    AU - Ishizaka, Kazuhisa

    AU - Obata, Motoki

    AU - Kimura, Keiji

    AU - Kasahara, Hironori

    PY - 2002

    Y1 - 2002

    N2 - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.

    AB - Effective use of cache memory is getting more important with increasing gap between the processor speed and memory access speed. Also, use of multigrain parallelism is getting more important to improve effective performance beyond the limitation of loop iteration level parallelism. Considering these factors, this paper proposes a coarse grain task static scheduling scheme considering cache optimization. The proposed scheme schedules coarse grain tasks to threads so that shared data among coarse grain tasks can be passed via cache after task and data decomposition considering cache size at compile time. It is implemented on OSCAR Fortran multigrain parallelizing compiler and evaluated on Sun Ultra80 four-processor SMP workstation, using Swim and Tomcatv from the SPEC fp 95. As the results, the proposed scheme gives us 4.56 times speedup for Swim and 2.37 times on 4 processors for Tomcatv respectively against the Sun Forte HPC 6 loop parallelizing compiler.

    UR - http://www.scopus.com/inward/record.url?scp=68749120674&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=68749120674&partnerID=8YFLogxK

    U2 - 10.1007/3-540-47847-7_44

    DO - 10.1007/3-540-47847-7_44

    M3 - Conference contribution

    AN - SCOPUS:68749120674

    SN - 354043674X

    SN - 9783540436744

    VL - 2327 LNCS

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 479

    EP - 489

    BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    ER -