Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages238-253
Number of pages16
Volume9519
ISBN (Print)9783319297774
DOIs
Publication statusPublished - 2016
Event28th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2015 - Raleigh, United States
Duration: 2015 Sep 92015 Sep 11

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9519
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other28th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2015
CountryUnited States
CityRaleigh
Period15/9/915/9/11

Fingerprint

Earthquake
Parallelization
Compiler
Earthquakes
Simulator
Servers
Server
Simulators
Motion
Speedup
Scaling
Disaster prevention
Earth sciences
Frequency division multiplexing
Data Dependence
Simulation
Multi-core Processor
Task Scheduling
Subroutines
Wave equations

Keywords

  • Compiler
  • Earthquake
  • GMS
  • OSCAR
  • Scc-NUMA
  • Task parallelism

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Shimaoka, M., Wada, Y., Kimura, K., & Kasahara, H. (2016). Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9519, pp. 238-253). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9519). Springer Verlag. https://doi.org/10.1007/978-3-319-29778-1_15

Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers. / Shimaoka, Mamoru; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9519 Springer Verlag, 2016. p. 238-253 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9519).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shimaoka, M, Wada, Y, Kimura, K & Kasahara, H 2016, Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 9519, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9519, Springer Verlag, pp. 238-253, 28th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2015, Raleigh, United States, 15/9/9. https://doi.org/10.1007/978-3-319-29778-1_15
Shimaoka M, Wada Y, Kimura K, Kasahara H. Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9519. Springer Verlag. 2016. p. 238-253. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-29778-1_15
Shimaoka, Mamoru ; Wada, Yasutaka ; Kimura, Keiji ; Kasahara, Hironori. / Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 9519 Springer Verlag, 2016. pp. 238-253 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3306d0731cb941d2877227895b5413cf,
title = "Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers",
abstract = "This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.",
keywords = "Compiler, Earthquake, GMS, OSCAR, Scc-NUMA, Task parallelism",
author = "Mamoru Shimaoka and Yasutaka Wada and Keiji Kimura and Hironori Kasahara",
year = "2016",
doi = "10.1007/978-3-319-29778-1_15",
language = "English",
isbn = "9783319297774",
volume = "9519",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "238--253",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Coarse grain task parallelization of earthquake simulator GMS using OSCAR compiler on various Cc-NUMA servers

AU - Shimaoka, Mamoru

AU - Wada, Yasutaka

AU - Kimura, Keiji

AU - Kasahara, Hironori

PY - 2016

Y1 - 2016

N2 - This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.

AB - This paper proposes coarse grain task parallelization for a earthquake simulation program using Finite Difference Method to solve the wave equations in 3-D heterogeneous structure or the Ground Motion Simulator (GMS) on various cc-NUMA servers using IBM, Intel and Fujitsu multicore processors. The GMS has been developed by the National Research Institute for Earth Science and Disaster Prevention (NIED) in Japan. Earthquake wave propagation simulations are important numerical applications to save lives through damage predictions of residential areas by earthquakes. Parallel processing with strong scaling has been required to precisely calculate the simulations quickly. The proposed method uses the OSCAR compiler for exploiting coarse grain task parallelism efficiently to get scalable speed-ups with strong scaling. The OSCAR compiler can analyze data dependence and control dependence among coarse grain tasks, such as subroutines, loops and basic blocks. Moreover, locality optimizations considering the boundary calculations of FDM and a new static scheduler that enables more efficient task schedulings on cc-NUMA servers are presented. The performance evaluation shows 110 times speed-up using 128 cores against the sequential execution on a POWER7 based 128 cores cc-NUMA server Hitachi SR16000 VM1, 37.2 times speed-up using 64 cores against the sequential execution on a Xeon E7-8830 based 64 cores cc-NUMA server BS2000, 19.8 times speed-up using 32 cores against the sequential execution on a Xeon X7560 based 32 cores cc-NUMA server HA8000/RS440, 99.3 times speed-up using 128 cores against the sequential execution on a SPARC64 VII based 256 cores cc-NUMA server Fujitsu M9000, 9.42 times speed-up using 12 cores against the sequential execution on a POWER8 based 12 cores cc-NUMA server Power System S812L.

KW - Compiler

KW - Earthquake

KW - GMS

KW - OSCAR

KW - Scc-NUMA

KW - Task parallelism

UR - http://www.scopus.com/inward/record.url?scp=84961124496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961124496&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-29778-1_15

DO - 10.1007/978-3-319-29778-1_15

M3 - Conference contribution

SN - 9783319297774

VL - 9519

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 238

EP - 253

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

PB - Springer Verlag

ER -