A 98 GMACs/W 32-core vector processor in 65nm CMOS

Xun He, Dajiang Zhou, Xin Jin, Satoshi Goto

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. The proposed hierarchical network can provide 192 GB/sintercore communication BW in average. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Comparing with MOESI, 67.8% of L1 Cache energy can be saved in 32 cores case. It can achieve a peak performance of 375 GMACs and 98 GMACs/W in 65 nm CMOS.

    Original languageEnglish
    Title of host publicationProceedings of the International Symposium on Low Power Electronics and Design
    Pages373-378
    Number of pages6
    DOIs
    Publication statusPublished - 2011
    Event17th IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2011 - Fukuoka
    Duration: 2011 Aug 12011 Aug 3

    Other

    Other17th IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2011
    CityFukuoka
    Period11/8/111/8/3

    Fingerprint

    Communication
    Processing

    Keywords

    • Cache coherence
    • Multicore Processor
    • NoC
    • SIMD

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    He, X., Zhou, D., Jin, X., & Goto, S. (2011). A 98 GMACs/W 32-core vector processor in 65nm CMOS. In Proceedings of the International Symposium on Low Power Electronics and Design (pp. 373-378). [5993669] https://doi.org/10.1109/ISLPED.2011.5993669

    A 98 GMACs/W 32-core vector processor in 65nm CMOS. / He, Xun; Zhou, Dajiang; Jin, Xin; Goto, Satoshi.

    Proceedings of the International Symposium on Low Power Electronics and Design. 2011. p. 373-378 5993669.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    He, X, Zhou, D, Jin, X & Goto, S 2011, A 98 GMACs/W 32-core vector processor in 65nm CMOS. in Proceedings of the International Symposium on Low Power Electronics and Design., 5993669, pp. 373-378, 17th IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2011, Fukuoka, 11/8/1. https://doi.org/10.1109/ISLPED.2011.5993669
    He X, Zhou D, Jin X, Goto S. A 98 GMACs/W 32-core vector processor in 65nm CMOS. In Proceedings of the International Symposium on Low Power Electronics and Design. 2011. p. 373-378. 5993669 https://doi.org/10.1109/ISLPED.2011.5993669
    He, Xun ; Zhou, Dajiang ; Jin, Xin ; Goto, Satoshi. / A 98 GMACs/W 32-core vector processor in 65nm CMOS. Proceedings of the International Symposium on Low Power Electronics and Design. 2011. pp. 373-378
    @inproceedings{5d64e27774854c508253530980eacb4b,
    title = "A 98 GMACs/W 32-core vector processor in 65nm CMOS",
    abstract = "This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. The proposed hierarchical network can provide 192 GB/sintercore communication BW in average. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Comparing with MOESI, 67.8{\%} of L1 Cache energy can be saved in 32 cores case. It can achieve a peak performance of 375 GMACs and 98 GMACs/W in 65 nm CMOS.",
    keywords = "Cache coherence, Multicore Processor, NoC, SIMD",
    author = "Xun He and Dajiang Zhou and Xin Jin and Satoshi Goto",
    year = "2011",
    doi = "10.1109/ISLPED.2011.5993669",
    language = "English",
    isbn = "9781612846590",
    pages = "373--378",
    booktitle = "Proceedings of the International Symposium on Low Power Electronics and Design",

    }

    TY - GEN

    T1 - A 98 GMACs/W 32-core vector processor in 65nm CMOS

    AU - He, Xun

    AU - Zhou, Dajiang

    AU - Jin, Xin

    AU - Goto, Satoshi

    PY - 2011

    Y1 - 2011

    N2 - This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. The proposed hierarchical network can provide 192 GB/sintercore communication BW in average. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Comparing with MOESI, 67.8% of L1 Cache energy can be saved in 32 cores case. It can achieve a peak performance of 375 GMACs and 98 GMACs/W in 65 nm CMOS.

    AB - This paper presents a high-performance dual-issue 32-core SIMD platform for image and video processing. Eight cores with a 4-ports L2 cache are connected by CIB bus as a cluster. Four clusters are connected by mesh network. The proposed hierarchical network can provide 192 GB/sintercore communication BW in average. To reduce coherence operation in large-scale SMP, an application specified protocol is proposed. Comparing with MOESI, 67.8% of L1 Cache energy can be saved in 32 cores case. It can achieve a peak performance of 375 GMACs and 98 GMACs/W in 65 nm CMOS.

    KW - Cache coherence

    KW - Multicore Processor

    KW - NoC

    KW - SIMD

    UR - http://www.scopus.com/inward/record.url?scp=80052735296&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=80052735296&partnerID=8YFLogxK

    U2 - 10.1109/ISLPED.2011.5993669

    DO - 10.1109/ISLPED.2011.5993669

    M3 - Conference contribution

    AN - SCOPUS:80052735296

    SN - 9781612846590

    SP - 373

    EP - 378

    BT - Proceedings of the International Symposium on Low Power Electronics and Design

    ER -