TY - JOUR
T1 - Deep Griffin–Lim Iteration
T2 - Trainable Iterative Phase Reconstruction Using Neural Network
AU - Masuyama, Yoshiki
AU - Yatabe, Kohei
AU - Koizumi, Yuma
AU - Oikawa, Yasuhiro
AU - Harada, Noboru
N1 - Publisher Copyright:
CCBY
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - In this paper, we propose a phase reconstruction framework, named Deep Griffin–Lim Iteration (DeGLI). Phase reconstruction is a fundamental technique for improving the quality of sound obtained through some process in the timefrequency domain. It has been shown that the recent methods using deep neural networks (DNN) outperformed the conventional iterative phase reconstruction methods such as the Griffin–Lim algorithm (GLA). However, the computational cost of DNNbased methods is not adjustable at the time of inference, which may limit the range of applications. To address this problem, we combine the iterative structure of GLA with a DNN so that the computational cost becomes adjustable by changing the number of iterations of the proposed DNN-based component. A training method that is independent of the number of iterations for inference is also proposed to minimize the computational cost of the training. This training method, named sub-block training by denoising (SBTD), avoids recursive use of the DNN and enables training of DeGLI with a single sub-block (corresponding to one GLA iteration). Furthermore, we propose a complex DNN based on complex convolution layers with gated mechanisms and investigated its performance in terms of the proposed framework. Through several experiments, we found that DeGLI significantly improved both objective and subjective measures from GLA by incorporating the DNN, and its sound quality was comparable to those of neural vocoders.
AB - In this paper, we propose a phase reconstruction framework, named Deep Griffin–Lim Iteration (DeGLI). Phase reconstruction is a fundamental technique for improving the quality of sound obtained through some process in the timefrequency domain. It has been shown that the recent methods using deep neural networks (DNN) outperformed the conventional iterative phase reconstruction methods such as the Griffin–Lim algorithm (GLA). However, the computational cost of DNNbased methods is not adjustable at the time of inference, which may limit the range of applications. To address this problem, we combine the iterative structure of GLA with a DNN so that the computational cost becomes adjustable by changing the number of iterations of the proposed DNN-based component. A training method that is independent of the number of iterations for inference is also proposed to minimize the computational cost of the training. This training method, named sub-block training by denoising (SBTD), avoids recursive use of the DNN and enables training of DeGLI with a single sub-block (corresponding to one GLA iteration). Furthermore, we propose a complex DNN based on complex convolution layers with gated mechanisms and investigated its performance in terms of the proposed framework. Through several experiments, we found that DeGLI significantly improved both objective and subjective measures from GLA by incorporating the DNN, and its sound quality was comparable to those of neural vocoders.
KW - Computational efficiency
KW - Griffin-Lim algorithm
KW - Image reconstruction
KW - Iterative methods
KW - Neural networks
KW - Spectrogram
KW - Time-domain analysis
KW - Training
KW - complex neural network
KW - phase reconstruction
KW - spectrogram consistency
KW - sub-block training by denoising (SBTD)
UR - http://www.scopus.com/inward/record.url?scp=85096393103&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096393103&partnerID=8YFLogxK
U2 - 10.1109/JSTSP.2020.3034486
DO - 10.1109/JSTSP.2020.3034486
M3 - Article
AN - SCOPUS:85096393103
JO - IEEE Journal on Selected Topics in Signal Processing
JF - IEEE Journal on Selected Topics in Signal Processing
SN - 1932-4553
ER -