A 98 GMACs/W 32-core vector processor in 65 nm CMOS

Xun He, Xin Jin, Minghui Wang, Dajiang Zhou, Satoshi Goto

    Research output: Contribution to journalArticle

    1 Citation (Scopus)

    Abstract

    This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BWin average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm 2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.

    Original languageEnglish
    Pages (from-to)2609-2618
    Number of pages10
    JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
    VolumeE94-A
    Issue number12
    DOIs
    Publication statusPublished - 2011 Dec

    Fingerprint

    Cache
    Phase locked loops
    Processing
    Mesh Networks
    Video Processing
    Hierarchical Networks
    Latency
    Image Processing
    High Performance
    Vertical
    Unit
    Energy

    Keywords

    • Cache coherence
    • GMACs
    • Multicore processor
    • NoC
    • SIMD

    ASJC Scopus subject areas

    • Electrical and Electronic Engineering
    • Computer Graphics and Computer-Aided Design
    • Applied Mathematics
    • Signal Processing

    Cite this

    A 98 GMACs/W 32-core vector processor in 65 nm CMOS. / He, Xun; Jin, Xin; Wang, Minghui; Zhou, Dajiang; Goto, Satoshi.

    In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E94-A, No. 12, 12.2011, p. 2609-2618.

    Research output: Contribution to journalArticle

    He, Xun ; Jin, Xin ; Wang, Minghui ; Zhou, Dajiang ; Goto, Satoshi. / A 98 GMACs/W 32-core vector processor in 65 nm CMOS. In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences. 2011 ; Vol. E94-A, No. 12. pp. 2609-2618.
    @article{338b4871c299400980525d6e6ac1827e,
    title = "A 98 GMACs/W 32-core vector processor in 65 nm CMOS",
    abstract = "This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BWin average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8{\%} of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm 2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.",
    keywords = "Cache coherence, GMACs, Multicore processor, NoC, SIMD",
    author = "Xun He and Xin Jin and Minghui Wang and Dajiang Zhou and Satoshi Goto",
    year = "2011",
    month = "12",
    doi = "10.1587/transfun.E94.A.2609",
    language = "English",
    volume = "E94-A",
    pages = "2609--2618",
    journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
    issn = "0916-8508",
    publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
    number = "12",

    }

    TY - JOUR

    T1 - A 98 GMACs/W 32-core vector processor in 65 nm CMOS

    AU - He, Xun

    AU - Jin, Xin

    AU - Wang, Minghui

    AU - Zhou, Dajiang

    AU - Goto, Satoshi

    PY - 2011/12

    Y1 - 2011/12

    N2 - This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BWin average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm 2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.

    AB - This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. The SIMD cores support 8/16 bits SIMD MAC instructions, and vertical vector access. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. This hierarchical network can provide more than 192 GB/s low latency inter-core BWin average. The 4-ports L2 cache architecture is also designed to provide 192GB/s L2 cache BW. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Compared with MOESI, 67.8% of L1 cache energy can be saved in 32 cores case. The whole system including 32 vector cores, 256KB L2 cache, 64-bit DDRII PHY and two PLL units, occupy 25mm 2 in 65 nm CMOS. It can achieve a peak performance of 375 GMACs and 98 GMACs/W at 1.2V.

    KW - Cache coherence

    KW - GMACs

    KW - Multicore processor

    KW - NoC

    KW - SIMD

    UR - http://www.scopus.com/inward/record.url?scp=82655175636&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=82655175636&partnerID=8YFLogxK

    U2 - 10.1587/transfun.E94.A.2609

    DO - 10.1587/transfun.E94.A.2609

    M3 - Article

    VL - E94-A

    SP - 2609

    EP - 2618

    JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

    JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

    SN - 0916-8508

    IS - 12

    ER -