Adversarial autoencoder for reducing nonlinear distortion

Naohiro Tawara, Tetsunori Kobayashi, Masaru Fujieda, Kazuhiro Katagiri, Takashi Yazu, Tetsuji Ogawa

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.

Original languageEnglish
Title of host publication2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1669-1673
Number of pages5
ISBN (Electronic)9789881476852
DOIs
Publication statusPublished - 2019 Mar 4
Event10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Honolulu, United States
Duration: 2018 Nov 122018 Nov 15

Publication series

Name2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

Conference

Conference10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
CountryUnited States
CityHonolulu
Period18/11/1218/11/15

Fingerprint

Nonlinear distortion
Speech intelligibility
Speech enhancement
Acoustic waves
Processing

ASJC Scopus subject areas

  • Information Systems

Cite this

Tawara, N., Kobayashi, T., Fujieda, M., Katagiri, K., Yazu, T., & Ogawa, T. (2019). Adversarial autoencoder for reducing nonlinear distortion. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings (pp. 1669-1673). [8659540] (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/APSIPA.2018.8659540

Adversarial autoencoder for reducing nonlinear distortion. / Tawara, Naohiro; Kobayashi, Tetsunori; Fujieda, Masaru; Katagiri, Kazuhiro; Yazu, Takashi; Ogawa, Tetsuji.

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. p. 1669-1673 8659540 (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tawara, N, Kobayashi, T, Fujieda, M, Katagiri, K, Yazu, T & Ogawa, T 2019, Adversarial autoencoder for reducing nonlinear distortion. in 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings., 8659540, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings, Institute of Electrical and Electronics Engineers Inc., pp. 1669-1673, 10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018, Honolulu, United States, 18/11/12. https://doi.org/10.23919/APSIPA.2018.8659540
Tawara N, Kobayashi T, Fujieda M, Katagiri K, Yazu T, Ogawa T. Adversarial autoencoder for reducing nonlinear distortion. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. p. 1669-1673. 8659540. (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings). https://doi.org/10.23919/APSIPA.2018.8659540
Tawara, Naohiro ; Kobayashi, Tetsunori ; Fujieda, Masaru ; Katagiri, Kazuhiro ; Yazu, Takashi ; Ogawa, Tetsuji. / Adversarial autoencoder for reducing nonlinear distortion. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 1669-1673 (2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings).
@inproceedings{afb7358e9aed4b218cd771816e5f8074,
title = "Adversarial autoencoder for reducing nonlinear distortion",
abstract = "A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.",
author = "Naohiro Tawara and Tetsunori Kobayashi and Masaru Fujieda and Kazuhiro Katagiri and Takashi Yazu and Tetsuji Ogawa",
year = "2019",
month = "3",
day = "4",
doi = "10.23919/APSIPA.2018.8659540",
language = "English",
series = "2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1669--1673",
booktitle = "2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings",

}

TY - GEN

T1 - Adversarial autoencoder for reducing nonlinear distortion

AU - Tawara, Naohiro

AU - Kobayashi, Tetsunori

AU - Fujieda, Masaru

AU - Katagiri, Kazuhiro

AU - Yazu, Takashi

AU - Ogawa, Tetsuji

PY - 2019/3/4

Y1 - 2019/3/4

N2 - A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.

AB - A novel post-filtering method using generative adversarial networks (GANs) is proposed to correct the effect of a nonlinear distortion caused by time-frequency (TF) masking. TF masking is a powerful framework for attenuating interfering sounds, but it can yield an unpleasant distortion of speech (e.g., a musical noise). A GAN-based autoencoder was recently shown to be effective for single-channel speech enhancement, however, using this technique for the post-processing of TF masking cannot help in nonlinear distortion reduction because some TF components are missing after TF-masking. Furthermore, the missing information is difficult embed using an autoencoder. In order to recover such missing components, an auxiliary reference signal that includes the target source components is concatenated with an enhanced signal, is then used as the input to the GAN-based autoencoder. Experimental comparisons show that the proposed post-filtering yields improvements in speech quality over TF-masking.

UR - http://www.scopus.com/inward/record.url?scp=85063472429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063472429&partnerID=8YFLogxK

U2 - 10.23919/APSIPA.2018.8659540

DO - 10.23919/APSIPA.2018.8659540

M3 - Conference contribution

T3 - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

SP - 1669

EP - 1673

BT - 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -