Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores

Birk Martin Magnussen*, Tohma Kawasumi, Hiroki Mikami, Keiji Kimura, Hironori Kasahara

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With an increasing number of shared memory multicore processor architectures, there is a requirement for supporting multiple architectures in automatic parallelizing compilers. The OSCAR (Optimally Scheduled Advanced Multiprocessor) automatic parallelizing compiler is able to parallelize many different sequential programs, such as scientific applications, embedded real-time applications, multimedia applications, and more. OSCAR compiler’s features include coarse-grain task parallelization with earliest execution condition analysis, analyzing both data and control dependencies, data locality optimizations over different loop nests with data dependencies, and the ability to generate parallelized code using the OSCAR API 2.1. The OSCAR API 2.1 is compatible with OpenMP for SMP multicores, with additional directives for power control and supporting heterogeneous multicores. This allows for a C or Fortran compiler with OpenMP support to generate parallel machine code for the target multicore. Additionally, using the OSCAR API analyzer allows a sequential-only compiler without OpenMP support to generate machine code for each core separately, which is then linked to one parallel application. Overall, only little configuration changes to the OSCAR compiler are needed to run and optimize OSCAR compiler-generated code on a specific platform. This paper evaluates the performance of OSCAR compiler-generated code on different modern SMP multicore processors, including Intel and AMD x86 processors, an Arm processor, and a RISC-V processor using scientific and multimedia benchmarks in C and Fortran. The results show promising speedups on all platforms, such as a speedup of 7.16 for the swim program of the SPEC2000 benchmarks on an 8-core Intel x86 processor, a speedup of 9.50 for the CG program of the NAS parallel benchmarks on 8 cores of an AMD x86 Processor, a speedup of 3.70 for the BT program of the NAS parallel benchmarks on a 4-core RISC-V processor, and a speedup of 2.64 for the equake program of the SPEC2000 benchmarks on 4 cores of an Arm processor.

Original languageEnglish
Title of host publicationLanguages and Compilers for Parallel Computing - 34th International Workshop, LCPC 2021, Revised Selected Papers
EditorsXiaoming Li, Sunita Chandrasekaran
PublisherSpringer Science and Business Media Deutschland GmbH
Pages50-64
Number of pages15
ISBN (Print)9783030993719
DOIs
Publication statusPublished - 2022
Event34th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2021 - Newark, United States
Duration: 2021 Oct 132021 Oct 14

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13181 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference34th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2021
Country/TerritoryUnited States
CityNewark
Period21/10/1321/10/14

Keywords

  • OSCAR
  • multicore
  • multiple platforms
  • parallelizing compiler
  • shared memory

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Performance Evaluation of OSCAR Multi-target Automatic Parallelizing Compiler on Intel, AMD, Arm and RISC-V Multicores'. Together they form a unique fingerprint.

Cite this