A 45nm 37.3GOPS/W heterogeneous multi-core SoC

Yoichi Yuyama, Masayuki Ito, Yoshikazu Kiyoshige, Yusuke Nitta, Shigezumi Matsui, Osamu Nishii, Atsushi Hasegawa, Makoto Ishikawa, Tetsuya Yamada, Junichi Miyakoshi, Koichi Terada, Tohru Nojiri, Makoto Satoh, Hiroyuki Mizuno, Kunio Uchiyama, Yasutaka Wada, Keiji Kimura, Hironori Kasahara, Hideo Maejima

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    29 Citations (Scopus)

    Abstract

    We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

    Original languageEnglish
    Title of host publicationDigest of Technical Papers - IEEE International Solid-State Circuits Conference
    Pages100-101
    Number of pages2
    Volume53
    DOIs
    Publication statusPublished - 2010
    Event2010 IEEE International Solid-State Circuits Conference, ISSCC 2010 - San Francisco, CA
    Duration: 2010 Feb 72010 Feb 11

    Other

    Other2010 IEEE International Solid-State Circuits Conference, ISSCC 2010
    CitySan Francisco, CA
    Period10/2/710/2/11

    Fingerprint

    Program processors
    Image recognition
    Electric power utilization
    Consumer electronics
    Optical flows
    Data transfer
    Processing
    Face recognition
    Decoding
    System-on-chip
    Acoustic waves
    morpholinoanthracycline MX2

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Electronic, Optical and Magnetic Materials

    Cite this

    Yuyama, Y., Ito, M., Kiyoshige, Y., Nitta, Y., Matsui, S., Nishii, O., ... Maejima, H. (2010). A 45nm 37.3GOPS/W heterogeneous multi-core SoC. In Digest of Technical Papers - IEEE International Solid-State Circuits Conference (Vol. 53, pp. 100-101). [5434031] https://doi.org/10.1109/ISSCC.2010.5434031

    A 45nm 37.3GOPS/W heterogeneous multi-core SoC. / Yuyama, Yoichi; Ito, Masayuki; Kiyoshige, Yoshikazu; Nitta, Yusuke; Matsui, Shigezumi; Nishii, Osamu; Hasegawa, Atsushi; Ishikawa, Makoto; Yamada, Tetsuya; Miyakoshi, Junichi; Terada, Koichi; Nojiri, Tohru; Satoh, Makoto; Mizuno, Hiroyuki; Uchiyama, Kunio; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori; Maejima, Hideo.

    Digest of Technical Papers - IEEE International Solid-State Circuits Conference. Vol. 53 2010. p. 100-101 5434031.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yuyama, Y, Ito, M, Kiyoshige, Y, Nitta, Y, Matsui, S, Nishii, O, Hasegawa, A, Ishikawa, M, Yamada, T, Miyakoshi, J, Terada, K, Nojiri, T, Satoh, M, Mizuno, H, Uchiyama, K, Wada, Y, Kimura, K, Kasahara, H & Maejima, H 2010, A 45nm 37.3GOPS/W heterogeneous multi-core SoC. in Digest of Technical Papers - IEEE International Solid-State Circuits Conference. vol. 53, 5434031, pp. 100-101, 2010 IEEE International Solid-State Circuits Conference, ISSCC 2010, San Francisco, CA, 10/2/7. https://doi.org/10.1109/ISSCC.2010.5434031
    Yuyama Y, Ito M, Kiyoshige Y, Nitta Y, Matsui S, Nishii O et al. A 45nm 37.3GOPS/W heterogeneous multi-core SoC. In Digest of Technical Papers - IEEE International Solid-State Circuits Conference. Vol. 53. 2010. p. 100-101. 5434031 https://doi.org/10.1109/ISSCC.2010.5434031
    Yuyama, Yoichi ; Ito, Masayuki ; Kiyoshige, Yoshikazu ; Nitta, Yusuke ; Matsui, Shigezumi ; Nishii, Osamu ; Hasegawa, Atsushi ; Ishikawa, Makoto ; Yamada, Tetsuya ; Miyakoshi, Junichi ; Terada, Koichi ; Nojiri, Tohru ; Satoh, Makoto ; Mizuno, Hiroyuki ; Uchiyama, Kunio ; Wada, Yasutaka ; Kimura, Keiji ; Kasahara, Hironori ; Maejima, Hideo. / A 45nm 37.3GOPS/W heterogeneous multi-core SoC. Digest of Technical Papers - IEEE International Solid-State Circuits Conference. Vol. 53 2010. pp. 100-101
    @inproceedings{72ad00c45ca7427c9a3b7fedfc646eeb,
    title = "A 45nm 37.3GOPS/W heterogeneous multi-core SoC",
    abstract = "We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.",
    author = "Yoichi Yuyama and Masayuki Ito and Yoshikazu Kiyoshige and Yusuke Nitta and Shigezumi Matsui and Osamu Nishii and Atsushi Hasegawa and Makoto Ishikawa and Tetsuya Yamada and Junichi Miyakoshi and Koichi Terada and Tohru Nojiri and Makoto Satoh and Hiroyuki Mizuno and Kunio Uchiyama and Yasutaka Wada and Keiji Kimura and Hironori Kasahara and Hideo Maejima",
    year = "2010",
    doi = "10.1109/ISSCC.2010.5434031",
    language = "English",
    isbn = "9781424460342",
    volume = "53",
    pages = "100--101",
    booktitle = "Digest of Technical Papers - IEEE International Solid-State Circuits Conference",

    }

    TY - GEN

    T1 - A 45nm 37.3GOPS/W heterogeneous multi-core SoC

    AU - Yuyama, Yoichi

    AU - Ito, Masayuki

    AU - Kiyoshige, Yoshikazu

    AU - Nitta, Yusuke

    AU - Matsui, Shigezumi

    AU - Nishii, Osamu

    AU - Hasegawa, Atsushi

    AU - Ishikawa, Makoto

    AU - Yamada, Tetsuya

    AU - Miyakoshi, Junichi

    AU - Terada, Koichi

    AU - Nojiri, Tohru

    AU - Satoh, Makoto

    AU - Mizuno, Hiroyuki

    AU - Uchiyama, Kunio

    AU - Wada, Yasutaka

    AU - Kimura, Keiji

    AU - Kasahara, Hironori

    AU - Maejima, Hideo

    PY - 2010

    Y1 - 2010

    N2 - We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

    AB - We develop a heterogeneous multi-core SoC for applications, such as digital TV systems with IP networks (IP-TV) including image recognition and database search. Figure 5.3.1 shows the chip features. This SoC is capable of decoding 1080i audio/video data using a part of SoC (one general-purpose CPU core, video processing unit called VPU5 and sound processing unit called SPU) [1]. Four dynamically reconfigurable processors called FE [2] are integrated and have a total theoretical performance of 41.5GOPS and power consumption of 0.76W. Two 1024-way matrix-processors called MX-2 [3] are integrated and have a total theoretical performance of 36.9GOPS and power consumption of 1.10W. Overall, the performance per watt of our SoC is 37.3GOPS/W at 1.15V, the highest among comparable processors [4-6] excluding special-purpose codecs. The operation granularity of the CPU, FE and MX-2 are 32bit, 16bit, and 4bit respectively, and thus we can assign the appropriate processor for each task in an effective manner. A heterogeneous multi-core approach is one of the most promising approaches to attain high performance with low frequency, or low power, for consumer electronics application and scientific applications, compared to homogeneous multi-core SoCs [4]. For example, for image-recognition application in the IP-TV system, the FEs are assigned to calculate optical flow operation [7] of VGA (640x480) size video data at 15fps, which requires 0.62GOPS. The MX-2s are used for face detection and calculation of the feature quantity of the VGA video data at 15fps, which requires 30.6GOPS. In addition, general-purpose CPU cores are used for database search using the results of the above operations, which requires further enhancement of CPU. The automatic parallelization compilers analyze parallelism of the data flow, generate coarse grain tasks, schedule tasks to minimize execution time considering data transfer overhead for general-purpose CPU and FE.

    UR - http://www.scopus.com/inward/record.url?scp=77952207417&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=77952207417&partnerID=8YFLogxK

    U2 - 10.1109/ISSCC.2010.5434031

    DO - 10.1109/ISSCC.2010.5434031

    M3 - Conference contribution

    AN - SCOPUS:77952207417

    SN - 9781424460342

    VL - 53

    SP - 100

    EP - 101

    BT - Digest of Technical Papers - IEEE International Solid-State Circuits Conference

    ER -