TY - GEN
T1 - Adversarial autoencoder for reducing nonlinear distortion
AU - Tawara, Naohiro
AU - Kobayashi, Tetsunori
AU - Fujieda, Masaru
AU - Katagiri, Kazuhiro
AU - Yazu, Takashi
AU - Ogawa, Tetsuji
N1 - Publisher Copyright:
© 2018 APSIPA organization.
PY - 2019/3/4
Y1 - 2019/3/4
N2 - A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.
AB - A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.
UR - http://www.scopus.com/inward/record.url?scp=85063472429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063472429&partnerID=8YFLogxK
U2 - 10.23919/APSIPA.2018.8659540
DO - 10.23919/APSIPA.2018.8659540
M3 - Conference contribution
AN - SCOPUS:85063472429
T3 - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
SP - 1669
EP - 1673
BT - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
Y2 - 12 November 2018 through 15 November 2018
ER -