Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning

Taira Tsuchiya, Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

    研究成果: Conference contribution

    8 引用 (Scopus)

    抜粋

    We introduce a novel type of representation learning to obtain a speaker invariant feature for zero-resource languages. Speaker adaptation is an important technique to build a robust acoustic model. For a zero-resource language, however, conventional model-dependent speaker adaptation methods such as constrained maximum likelihood linear regression are insufficient because the acoustic model of the target language is not accessible. Therefore, we introduce a model-independent feature extraction based on a neural network. Specifically, we introduce a multi-task learning to a bottleneck feature-based approach to make bottleneck feature invariant to a change of speakers. The proposed network simultaneously tackles two tasks: phoneme and speaker classifications. This network trains a feature extractor in an adversarial manner to allow it to map input data into a discriminative representation to predict phonemes, whereas it is difficult to predict speakers. We conduct phone discriminant experiments in Zero Resource Speech Challenge 2017. Experimental results showed that our multi-task network yielded more discriminative features eliminating the variety in speakers.

    元の言語English
    ホスト出版物のタイトル2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
    出版者Institute of Electrical and Electronics Engineers Inc.
    ページ2381-2385
    ページ数5
    2018-April
    ISBN(印刷物)9781538646588
    DOI
    出版物ステータスPublished - 2018 9 10
    イベント2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
    継続期間: 2018 4 152018 4 20

    Other

    Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
    Canada
    Calgary
    期間18/4/1518/4/20

      フィンガープリント

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Electrical and Electronic Engineering

    これを引用

    Tsuchiya, T., Tawara, N., Ogawa, T., & Kobayashi, T. (2018). Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. : 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (巻 2018-April, pp. 2381-2385). [8461648] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461648