Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors

Huatao Zhao, Jiongyao Ye, Yuxin Sun, Takahiro Watanabe

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43%, compared to a conventional processor.

Original languageEnglish
Title of host publicationProceedings of International Conference on ASIC
PublisherIEEE Computer Society
ISBN (Print)9781467364157
DOIs
Publication statusPublished - 2013
Event2013 IEEE 10th International Conference on ASIC, ASICON 2013 - Shenzhen
Duration: 2013 Oct 282013 Oct 31

Other

Other2013 IEEE 10th International Conference on ASIC, ASICON 2013
CityShenzhen
Period13/10/2813/10/31

Fingerprint

Processing
Pipelines
Electric power utilization
Bandwidth

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Zhao, H., Ye, J., Sun, Y., & Watanabe, T. (2013). Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors. In Proceedings of International Conference on ASIC [6811990] IEEE Computer Society. https://doi.org/10.1109/ASICON.2013.6811990

Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors. / Zhao, Huatao; Ye, Jiongyao; Sun, Yuxin; Watanabe, Takahiro.

Proceedings of International Conference on ASIC. IEEE Computer Society, 2013. 6811990.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, H, Ye, J, Sun, Y & Watanabe, T 2013, Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors. in Proceedings of International Conference on ASIC., 6811990, IEEE Computer Society, 2013 IEEE 10th International Conference on ASIC, ASICON 2013, Shenzhen, 13/10/28. https://doi.org/10.1109/ASICON.2013.6811990
Zhao H, Ye J, Sun Y, Watanabe T. Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors. In Proceedings of International Conference on ASIC. IEEE Computer Society. 2013. 6811990 https://doi.org/10.1109/ASICON.2013.6811990
Zhao, Huatao ; Ye, Jiongyao ; Sun, Yuxin ; Watanabe, Takahiro. / Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors. Proceedings of International Conference on ASIC. IEEE Computer Society, 2013.
@inproceedings{e749345d13234ff389744fb8c79437da,
title = "Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors",
abstract = "In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43{\%}, compared to a conventional processor.",
author = "Huatao Zhao and Jiongyao Ye and Yuxin Sun and Takahiro Watanabe",
year = "2013",
doi = "10.1109/ASICON.2013.6811990",
language = "English",
isbn = "9781467364157",
booktitle = "Proceedings of International Conference on ASIC",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Pseudo Dual Path Processing to reduce the branch misprediction penalty in embedded processors

AU - Zhao, Huatao

AU - Ye, Jiongyao

AU - Sun, Yuxin

AU - Watanabe, Takahiro

PY - 2013

Y1 - 2013

N2 - In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43%, compared to a conventional processor.

AB - In modern embedded processors, a superscalar technique and a deep pipeline architecture are widely used to achieve higher performance, but the branch misprediction penalty is acting as a significant constraint on system performance. As the depth of pipeline increases, re-filling the pipeline plays a key role causing the branch misprediction penalty. In this paper, we propose a new mechanism, named Pseudo Dual Path Processing (PDPP), to reduce the branch misprediction penalty. The mechanism uses a small trace cache to store a set of successive decoded instructions and related renaming information from the alternative path, so that those instructions can skip the fetch and decode stages on a trace cache hit, and the renaming process for all instructions from the alternative path can be executed in one cycle by using the renaming information stored in advance. Therefore, PDPP nearly does not reduce the effective bandwidth of the front-end stages during processing instructions from two paths, but reduces the re-fill penalty without increasing the design complexity and the power consumption. In addition, a critical path prediction is employed to improve the efficiency of the PDPP by preventing the non-critical branches from being forked. The experimental results show that the proposed PDPP improves the IPC by 7.43%, compared to a conventional processor.

UR - http://www.scopus.com/inward/record.url?scp=84901336245&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901336245&partnerID=8YFLogxK

U2 - 10.1109/ASICON.2013.6811990

DO - 10.1109/ASICON.2013.6811990

M3 - Conference contribution

AN - SCOPUS:84901336245

SN - 9781467364157

BT - Proceedings of International Conference on ASIC

PB - IEEE Computer Society

ER -