A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation

Zhenyu Liu, Yang Song, Takeshi Ikenaga, Satoshi Goto

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. An FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2sdata (k=0, 1, ..., s-1). Because no inter stage data transfer is required, memory consumption is reduced to 1/3 of the original algorithm. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. An 18-bit word-length 1024-point FFT architecture with 4 BUs is given to demonstrate this mapping algorithm. The design is implemented with TSMC 0.18μm CMOS technology. The core area is 2.99×1.12mm2 and clock frequency is 326MHz in typical condition (1.8V, 25°C). This processor could complete 1024 FFT calculation in 7.839μs.

Original languageEnglish
Title of host publicationProceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
Pages291-295
Number of pages5
Publication statusPublished - 2005
Event2005 ACM Great Lakessymposium on VLSI, GLSVLSI'05 - Chicago, IL
Duration: 2005 Apr 172005 Apr 19

Other

Other2005 ACM Great Lakessymposium on VLSI, GLSVLSI'05
CityChicago, IL
Period05/4/1705/4/19

Fingerprint

Array processing
Fast Fourier transforms
Hardware
Data storage equipment
Data transfer
Clocks
Throughput
Processing
Costs

Keywords

  • Array Processing
  • Fast Fourier Transform (FFT)
  • Singleton Algorithm

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Liu, Z., Song, Y., Ikenaga, T., & Goto, S. (2005). A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation. In Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI (pp. 291-295). [S8.5S]

A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation. / Liu, Zhenyu; Song, Yang; Ikenaga, Takeshi; Goto, Satoshi.

Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI. 2005. p. 291-295 S8.5S.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Z, Song, Y, Ikenaga, T & Goto, S 2005, A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation. in Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI., S8.5S, pp. 291-295, 2005 ACM Great Lakessymposium on VLSI, GLSVLSI'05, Chicago, IL, 05/4/17.
Liu Z, Song Y, Ikenaga T, Goto S. A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation. In Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI. 2005. p. 291-295. S8.5S
Liu, Zhenyu ; Song, Yang ; Ikenaga, Takeshi ; Goto, Satoshi. / A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation. Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI. 2005. pp. 291-295
@inproceedings{04274ae7d3c4409ba5ec1b2f2643f686,
title = "A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation",
abstract = "Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. An FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2sdata (k=0, 1, ..., s-1). Because no inter stage data transfer is required, memory consumption is reduced to 1/3 of the original algorithm. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. An 18-bit word-length 1024-point FFT architecture with 4 BUs is given to demonstrate this mapping algorithm. The design is implemented with TSMC 0.18μm CMOS technology. The core area is 2.99×1.12mm2 and clock frequency is 326MHz in typical condition (1.8V, 25°C). This processor could complete 1024 FFT calculation in 7.839μs.",
keywords = "Array Processing, Fast Fourier Transform (FFT), Singleton Algorithm",
author = "Zhenyu Liu and Yang Song and Takeshi Ikenaga and Satoshi Goto",
year = "2005",
language = "English",
pages = "291--295",
booktitle = "Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI",

}

TY - GEN

T1 - A VLSI array processing oriented Fast Fourier Transform algorithm and hardware implementation

AU - Liu, Zhenyu

AU - Song, Yang

AU - Ikenaga, Takeshi

AU - Goto, Satoshi

PY - 2005

Y1 - 2005

N2 - Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. An FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2sdata (k=0, 1, ..., s-1). Because no inter stage data transfer is required, memory consumption is reduced to 1/3 of the original algorithm. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. An 18-bit word-length 1024-point FFT architecture with 4 BUs is given to demonstrate this mapping algorithm. The design is implemented with TSMC 0.18μm CMOS technology. The core area is 2.99×1.12mm2 and clock frequency is 326MHz in typical condition (1.8V, 25°C). This processor could complete 1024 FFT calculation in 7.839μs.

AB - Many parallel Fast Fourier Transform (FFT) algorithms adopt multiple stages architecture to increase performance. However, data permutation between stages consumes volume memory and processing time. An FFT array processing mapping algorithm is proposed in this paper to overcome this demerit. In this algorithm, arbitrary 2k butterfly units (BUs) could be scheduled to work in parallel on n=2sdata (k=0, 1, ..., s-1). Because no inter stage data transfer is required, memory consumption is reduced to 1/3 of the original algorithm. Moreover, with the increasing of BUs, not only does throughput increase linearly, system latency also decreases linearly. This array processing orientated architecture provides flexible tradeoff between hardware cost and system performance. An 18-bit word-length 1024-point FFT architecture with 4 BUs is given to demonstrate this mapping algorithm. The design is implemented with TSMC 0.18μm CMOS technology. The core area is 2.99×1.12mm2 and clock frequency is 326MHz in typical condition (1.8V, 25°C). This processor could complete 1024 FFT calculation in 7.839μs.

KW - Array Processing

KW - Fast Fourier Transform (FFT)

KW - Singleton Algorithm

UR - http://www.scopus.com/inward/record.url?scp=29244492369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=29244492369&partnerID=8YFLogxK

M3 - Conference contribution

SP - 291

EP - 295

BT - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI

ER -