Deep Griffin-Lim Iteration

Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.

Original languageEnglish
Title of host publication2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages61-65
Number of pages5
ISBN (Electronic)9781479981311
DOIs
Publication statusPublished - 2019 May 1
Event44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Brighton, United Kingdom
Duration: 2019 May 122019 May 17

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2019-May
ISSN (Print)1520-6149

Conference

Conference44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
CountryUnited Kingdom
CityBrighton
Period19/5/1219/5/17

Fingerprint

Redundancy
Fourier transforms
Signal processing
Deep neural networks

Keywords

  • deep neural network
  • Phase reconstruction
  • residual learning
  • spectrogram consistency

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Masuyama, Y., Yatabe, K., Koizumi, Y., Oikawa, Y., & Harada, N. (2019). Deep Griffin-Lim Iteration. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings (pp. 61-65). [8682744] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2019.8682744

Deep Griffin-Lim Iteration. / Masuyama, Yoshiki; Yatabe, Kohei; Koizumi, Yuma; Oikawa, Yasuhiro; Harada, Noboru.

2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 61-65 8682744 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Masuyama, Y, Yatabe, K, Koizumi, Y, Oikawa, Y & Harada, N 2019, Deep Griffin-Lim Iteration. in 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings., 8682744, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019-May, Institute of Electrical and Electronics Engineers Inc., pp. 61-65, 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019, Brighton, United Kingdom, 19/5/12. https://doi.org/10.1109/ICASSP.2019.8682744
Masuyama Y, Yatabe K, Koizumi Y, Oikawa Y, Harada N. Deep Griffin-Lim Iteration. In 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 61-65. 8682744. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2019.8682744
Masuyama, Yoshiki ; Yatabe, Kohei ; Koizumi, Yuma ; Oikawa, Yasuhiro ; Harada, Noboru. / Deep Griffin-Lim Iteration. 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 61-65 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).
@inproceedings{f75fc9c3839e4c60b47d3f3388492d33,
title = "Deep Griffin-Lim Iteration",
abstract = "This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.",
keywords = "deep neural network, Phase reconstruction, residual learning, spectrogram consistency",
author = "Yoshiki Masuyama and Kohei Yatabe and Yuma Koizumi and Yasuhiro Oikawa and Noboru Harada",
year = "2019",
month = "5",
day = "1",
doi = "10.1109/ICASSP.2019.8682744",
language = "English",
series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "61--65",
booktitle = "2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings",

}

TY - GEN

T1 - Deep Griffin-Lim Iteration

AU - Masuyama, Yoshiki

AU - Yatabe, Kohei

AU - Koizumi, Yuma

AU - Oikawa, Yasuhiro

AU - Harada, Noboru

PY - 2019/5/1

Y1 - 2019/5/1

N2 - This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.

AB - This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN). To retrieve a time-domain signal from its amplitude spectrogram, the corresponding phase is required. One of the popular phase reconstruction methods is the Griffin-Lim algorithm (GLA), which is based on the redundancy of the short-time Fourier transform. However, GLA often involves many iterations and produces low-quality signals owing to the lack of prior knowledge of the target signal. In order to address these issues, in this study, we propose an architecture which stacks a sub-block including two GLA-inspired fixed layers and a DNN. The number of stacked sub-blocks is adjustable, and we can trade the performance and computational load based on requirements of applications. The effectiveness of the proposed method is investigated by reconstructing phases from amplitude spectrograms of speeches.

KW - deep neural network

KW - Phase reconstruction

KW - residual learning

KW - spectrogram consistency

UR - http://www.scopus.com/inward/record.url?scp=85068966043&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068966043&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2019.8682744

DO - 10.1109/ICASSP.2019.8682744

M3 - Conference contribution

AN - SCOPUS:85068966043

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 61

EP - 65

BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -