Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning

Taira Tsuchiya, Naohiro Tawara, Tetsuji Ogawa, Tetsunori Kobayashi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    7 Citations (Scopus)

    Abstract

    We introduce a novel type of representation learning to obtain a speaker invariant feature for zero-resource languages. Speaker adaptation is an important technique to build a robust acoustic model. For a zero-resource language, however, conventional model-dependent speaker adaptation methods such as constrained maximum likelihood linear regression are insufficient because the acoustic model of the target language is not accessible. Therefore, we introduce a model-independent feature extraction based on a neural network. Specifically, we introduce a multi-task learning to a bottleneck feature-based approach to make bottleneck feature invariant to a change of speakers. The proposed network simultaneously tackles two tasks: phoneme and speaker classifications. This network trains a feature extractor in an adversarial manner to allow it to map input data into a discriminative representation to predict phonemes, whereas it is difficult to predict speakers. We conduct phone discriminant experiments in Zero Resource Speech Challenge 2017. Experimental results showed that our multi-task network yielded more discriminative features eliminating the variety in speakers.

    Original languageEnglish
    Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages2381-2385
    Number of pages5
    Volume2018-April
    ISBN (Print)9781538646588
    DOIs
    Publication statusPublished - 2018 Sep 10
    Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
    Duration: 2018 Apr 152018 Apr 20

    Other

    Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
    CountryCanada
    CityCalgary
    Period18/4/1518/4/20

    Fingerprint

    Feature extraction
    Acoustics
    Linear regression
    Maximum likelihood
    Neural networks
    Experiments

    Keywords

    • Adversarial multi-task learning
    • FMLLR
    • Representation learning
    • Speaker invariant feature
    • Zero resource speech challenge

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Electrical and Electronic Engineering

    Cite this

    Tsuchiya, T., Tawara, N., Ogawa, T., & Kobayashi, T. (2018). Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (Vol. 2018-April, pp. 2381-2385). [8461648] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461648

    Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. / Tsuchiya, Taira; Tawara, Naohiro; Ogawa, Tetsuji; Kobayashi, Tetsunori.

    2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. p. 2381-2385 8461648.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Tsuchiya, T, Tawara, N, Ogawa, T & Kobayashi, T 2018, Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. vol. 2018-April, 8461648, Institute of Electrical and Electronics Engineers Inc., pp. 2381-2385, 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018, Calgary, Canada, 18/4/15. https://doi.org/10.1109/ICASSP.2018.8461648
    Tsuchiya T, Tawara N, Ogawa T, Kobayashi T. Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April. Institute of Electrical and Electronics Engineers Inc. 2018. p. 2381-2385. 8461648 https://doi.org/10.1109/ICASSP.2018.8461648
    Tsuchiya, Taira ; Tawara, Naohiro ; Ogawa, Tetsuji ; Kobayashi, Tetsunori. / Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning. 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. Vol. 2018-April Institute of Electrical and Electronics Engineers Inc., 2018. pp. 2381-2385
    @inproceedings{d35c1e64e80042c58db604497a757c4b,
    title = "Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning",
    abstract = "We introduce a novel type of representation learning to obtain a speaker invariant feature for zero-resource languages. Speaker adaptation is an important technique to build a robust acoustic model. For a zero-resource language, however, conventional model-dependent speaker adaptation methods such as constrained maximum likelihood linear regression are insufficient because the acoustic model of the target language is not accessible. Therefore, we introduce a model-independent feature extraction based on a neural network. Specifically, we introduce a multi-task learning to a bottleneck feature-based approach to make bottleneck feature invariant to a change of speakers. The proposed network simultaneously tackles two tasks: phoneme and speaker classifications. This network trains a feature extractor in an adversarial manner to allow it to map input data into a discriminative representation to predict phonemes, whereas it is difficult to predict speakers. We conduct phone discriminant experiments in Zero Resource Speech Challenge 2017. Experimental results showed that our multi-task network yielded more discriminative features eliminating the variety in speakers.",
    keywords = "Adversarial multi-task learning, FMLLR, Representation learning, Speaker invariant feature, Zero resource speech challenge",
    author = "Taira Tsuchiya and Naohiro Tawara and Tetsuji Ogawa and Tetsunori Kobayashi",
    year = "2018",
    month = "9",
    day = "10",
    doi = "10.1109/ICASSP.2018.8461648",
    language = "English",
    isbn = "9781538646588",
    volume = "2018-April",
    pages = "2381--2385",
    booktitle = "2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",

    }

    TY - GEN

    T1 - Speaker Invariant Feature Extraction for Zero-Resource Languages with Adversarial Learning

    AU - Tsuchiya, Taira

    AU - Tawara, Naohiro

    AU - Ogawa, Tetsuji

    AU - Kobayashi, Tetsunori

    PY - 2018/9/10

    Y1 - 2018/9/10

    N2 - We introduce a novel type of representation learning to obtain a speaker invariant feature for zero-resource languages. Speaker adaptation is an important technique to build a robust acoustic model. For a zero-resource language, however, conventional model-dependent speaker adaptation methods such as constrained maximum likelihood linear regression are insufficient because the acoustic model of the target language is not accessible. Therefore, we introduce a model-independent feature extraction based on a neural network. Specifically, we introduce a multi-task learning to a bottleneck feature-based approach to make bottleneck feature invariant to a change of speakers. The proposed network simultaneously tackles two tasks: phoneme and speaker classifications. This network trains a feature extractor in an adversarial manner to allow it to map input data into a discriminative representation to predict phonemes, whereas it is difficult to predict speakers. We conduct phone discriminant experiments in Zero Resource Speech Challenge 2017. Experimental results showed that our multi-task network yielded more discriminative features eliminating the variety in speakers.

    AB - We introduce a novel type of representation learning to obtain a speaker invariant feature for zero-resource languages. Speaker adaptation is an important technique to build a robust acoustic model. For a zero-resource language, however, conventional model-dependent speaker adaptation methods such as constrained maximum likelihood linear regression are insufficient because the acoustic model of the target language is not accessible. Therefore, we introduce a model-independent feature extraction based on a neural network. Specifically, we introduce a multi-task learning to a bottleneck feature-based approach to make bottleneck feature invariant to a change of speakers. The proposed network simultaneously tackles two tasks: phoneme and speaker classifications. This network trains a feature extractor in an adversarial manner to allow it to map input data into a discriminative representation to predict phonemes, whereas it is difficult to predict speakers. We conduct phone discriminant experiments in Zero Resource Speech Challenge 2017. Experimental results showed that our multi-task network yielded more discriminative features eliminating the variety in speakers.

    KW - Adversarial multi-task learning

    KW - FMLLR

    KW - Representation learning

    KW - Speaker invariant feature

    KW - Zero resource speech challenge

    UR - http://www.scopus.com/inward/record.url?scp=85054272708&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85054272708&partnerID=8YFLogxK

    U2 - 10.1109/ICASSP.2018.8461648

    DO - 10.1109/ICASSP.2018.8461648

    M3 - Conference contribution

    AN - SCOPUS:85054272708

    SN - 9781538646588

    VL - 2018-April

    SP - 2381

    EP - 2385

    BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -