A VLSI array processing oriented fast fourier transform algorithm and hardware implementation

Zhenyu Liu, Yang Song, Takeshi Ikenaga, Satoshi Goto

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n = 2s data (k = 0, 1,⋯, s - 1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s × 2s-k) × tclk and the throughput is n/(s x 2s-k × tclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18μm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.991 × 1.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V, 25°C). This processor completes 1024 FFT calculation in 7.839 μs.

Original languageEnglish
Pages (from-to)3523-3529
Number of pages7
JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
VolumeE88-A
Issue number12
DOIs
Publication statusPublished - 2005 Dec

Fingerprint

Array processing
Hardware Implementation
Fast Fourier transform
Fast Fourier transforms
Hardware
Latency
Unit
Clocks
Throughput
Linearly
Data storage equipment
Data Transfer
Data transfer
Scalability
System Performance
Permutation
High Performance
Trade-offs
Decrease
Costs

Keywords

  • Array processing
  • Fast fourier transform (FFT)
  • Singleton algorithm

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Hardware and Architecture
  • Information Systems

Cite this

A VLSI array processing oriented fast fourier transform algorithm and hardware implementation. / Liu, Zhenyu; Song, Yang; Ikenaga, Takeshi; Goto, Satoshi.

In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E88-A, No. 12, 12.2005, p. 3523-3529.

Research output: Contribution to journalArticle

@article{30cbbf3a90a54c4d8630c64f0d1b14cc,
title = "A VLSI array processing oriented fast fourier transform algorithm and hardware implementation",
abstract = "Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n = 2s data (k = 0, 1,⋯, s - 1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s × 2s-k) × tclk and the throughput is n/(s x 2s-k × tclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18μm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.991 × 1.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V, 25°C). This processor completes 1024 FFT calculation in 7.839 μs.",
keywords = "Array processing, Fast fourier transform (FFT), Singleton algorithm",
author = "Zhenyu Liu and Yang Song and Takeshi Ikenaga and Satoshi Goto",
year = "2005",
month = "12",
doi = "10.1093/ietfec/e88-a.12.3523",
language = "English",
volume = "E88-A",
pages = "3523--3529",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "12",

}

TY - JOUR

T1 - A VLSI array processing oriented fast fourier transform algorithm and hardware implementation

AU - Liu, Zhenyu

AU - Song, Yang

AU - Ikenaga, Takeshi

AU - Goto, Satoshi

PY - 2005/12

Y1 - 2005/12

N2 - Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n = 2s data (k = 0, 1,⋯, s - 1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s × 2s-k) × tclk and the throughput is n/(s x 2s-k × tclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18μm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.991 × 1.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V, 25°C). This processor completes 1024 FFT calculation in 7.839 μs.

AB - Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. One FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n = 2s data (k = 0, 1,⋯, s - 1). Because no inter stage data transfer is required, memory consumption and system latency are both greatly reduced. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. In theory, the system latency is (s × 2s-k) × tclk and the throughput is n/(s x 2s-k × tclk), where tclk is the system clock period. Based on this mapping algorithm, several 18-bit word-length 1024-point FFT processors implemented with TSMC0.18μm CMOS technology are given to demonstrate its scalability and high performance. The core area of 4-BU design is 2.991 × 1.121 mm2 and clock frequency is 326 MHz in typical condition (1.8 V, 25°C). This processor completes 1024 FFT calculation in 7.839 μs.

KW - Array processing

KW - Fast fourier transform (FFT)

KW - Singleton algorithm

UR - http://www.scopus.com/inward/record.url?scp=29144533436&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29144533436&partnerID=8YFLogxK

U2 - 10.1093/ietfec/e88-a.12.3523

DO - 10.1093/ietfec/e88-a.12.3523

M3 - Article

VL - E88-A

SP - 3523

EP - 3529

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 12

ER -