Abstract
Modern embedded systems favor the chip multiprocessor frame to achieve higher performance, but they are restricted by the inefficient cache hierarchies. Typically, the accessing interference and improper allocation in last-level cache (LLC) shared by multiprocessors cause significant energy consumption and performance depression. In this paper, we propose a configurable and partitioned cache hierarchy where an energy-efficient runtime mechanism can well manage the shared LLC to meet application programs. This mechanism utilizes the repeated behaviors in hot subroutines of application and selects the proper partition intervals. Then, a low-power metric based configurable scheme is employed to explore the optimal allocation of given cache resources. Thus, we can provide each core with the optimal allocation information to dynamically partition the shared LLC during runtime. Experimental results for a quad-core system using the SPEC2006 benchmarks show that the cache access energy can be reduced by on average 32.5 percent compared to the equal partition scheme only with 1.3 percent performance off.
Original language | English |
---|---|
Title of host publication | Conference Proceedings - 13th IEEE International NEW Circuits and Systems Conference, NEWCAS 2015 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Print) | 9781479988938 |
DOIs | |
Publication status | Published - 2015 Aug 6 |
Event | 13th IEEE International NEW Circuits and Systems Conference, NEWCAS 2015 - Grenoble, France Duration: 2015 Jun 7 → 2015 Jun 10 |
Other
Other | 13th IEEE International NEW Circuits and Systems Conference, NEWCAS 2015 |
---|---|
Country | France |
City | Grenoble |
Period | 15/6/7 → 15/6/10 |
Fingerprint
ASJC Scopus subject areas
- Electrical and Electronic Engineering
Cite this
Application-specific shared last-level cache optimization for low-power embedded systems. / Zhao, Huatao; Ye, Jiongyao; Su, Xian; Watanabe, Takahiro.
Conference Proceedings - 13th IEEE International NEW Circuits and Systems Conference, NEWCAS 2015. Institute of Electrical and Electronics Engineers Inc., 2015. 7181994.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Application-specific shared last-level cache optimization for low-power embedded systems
AU - Zhao, Huatao
AU - Ye, Jiongyao
AU - Su, Xian
AU - Watanabe, Takahiro
PY - 2015/8/6
Y1 - 2015/8/6
N2 - Modern embedded systems favor the chip multiprocessor frame to achieve higher performance, but they are restricted by the inefficient cache hierarchies. Typically, the accessing interference and improper allocation in last-level cache (LLC) shared by multiprocessors cause significant energy consumption and performance depression. In this paper, we propose a configurable and partitioned cache hierarchy where an energy-efficient runtime mechanism can well manage the shared LLC to meet application programs. This mechanism utilizes the repeated behaviors in hot subroutines of application and selects the proper partition intervals. Then, a low-power metric based configurable scheme is employed to explore the optimal allocation of given cache resources. Thus, we can provide each core with the optimal allocation information to dynamically partition the shared LLC during runtime. Experimental results for a quad-core system using the SPEC2006 benchmarks show that the cache access energy can be reduced by on average 32.5 percent compared to the equal partition scheme only with 1.3 percent performance off.
AB - Modern embedded systems favor the chip multiprocessor frame to achieve higher performance, but they are restricted by the inefficient cache hierarchies. Typically, the accessing interference and improper allocation in last-level cache (LLC) shared by multiprocessors cause significant energy consumption and performance depression. In this paper, we propose a configurable and partitioned cache hierarchy where an energy-efficient runtime mechanism can well manage the shared LLC to meet application programs. This mechanism utilizes the repeated behaviors in hot subroutines of application and selects the proper partition intervals. Then, a low-power metric based configurable scheme is employed to explore the optimal allocation of given cache resources. Thus, we can provide each core with the optimal allocation information to dynamically partition the shared LLC during runtime. Experimental results for a quad-core system using the SPEC2006 benchmarks show that the cache access energy can be reduced by on average 32.5 percent compared to the equal partition scheme only with 1.3 percent performance off.
UR - http://www.scopus.com/inward/record.url?scp=84945144182&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84945144182&partnerID=8YFLogxK
U2 - 10.1109/NEWCAS.2015.7181994
DO - 10.1109/NEWCAS.2015.7181994
M3 - Conference contribution
AN - SCOPUS:84945144182
SN - 9781479988938
BT - Conference Proceedings - 13th IEEE International NEW Circuits and Systems Conference, NEWCAS 2015
PB - Institute of Electrical and Electronics Engineers Inc.
ER -