A 45nm 37.3GOPS/W heterogeneous multi-core SoC

Yoichi Yuyama, Masayuki Ito, Yoshikazu Kiyoshige, Yusuke Nitta, Shigezumi Matsui, Osamu Nishii, Atsushi Hasegawa, Makoto Ishikawa, Tetsuya Yamada, Junichi Miyakoshi, Koichi Terada, Tohru Nojiri, Makoto Satoh, Hiroyuki Mizuno, Kunio Uchiyama, Yasutaka Wada, Keiji Kimura, Hironori Kasahara, Hideo Maejima

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    29 Citations (Scopus)

    Abstract

    We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

    Original languageEnglish
    Title of host publicationDigest of Technical Papers - IEEE International Solid-State Circuits Conference
    Pages100-101
    Number of pages2
    Volume53
    DOIs
    Publication statusPublished - 2010
    Event2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - San Francisco, CA
    Duration: 2010 Feb 72010 Feb 11

    Other

    Other2010 IEEE International Solid-State Circuits Conference, ISSCC 2010
    CitySan Francisco, CA
    Period10/2/710/2/11

      Fingerprint

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Electronic, Optical and Magnetic Materials

    Cite this

    Yuyama, Y., Ito, M., Kiyoshige, Y., Nitta, Y., Matsui, S., Nishii, O., Hasegawa, A., Ishikawa, M., Yamada, T., Miyakoshi, J., Terada, K., Nojiri, T., Satoh, M., Mizuno, H., Uchiyama, K., Wada, Y., Kimura, K., Kasahara, H., & Maejima, H. (2010). A 45nm 37.3GOPS/W heterogeneous multi-core SoC. In Digest of Technical Papers - IEEE International Solid-State Circuits Conference (Vol. 53, pp. 100-101). [5434031] https://doi.org/10.1109/ISSCC.2010.5434031