Effect of frequency weighting on MLP-based speaker canonicalization

Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.

    Original languageEnglish
    Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
    PublisherInternational Speech and Communication Association
    Pages2987-2991
    Number of pages5
    Publication statusPublished - 2014
    Event15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014 - Singapore, Singapore
    Duration: 2014 Sep 142014 Sep 18

    Other

    Other15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014
    CountrySingapore
    CitySingapore
    Period14/9/1414/9/18

    Fingerprint

    Multilayer neural networks
    Perceptron
    Weighting
    Multilayer
    Normalization
    Identity mapping
    Warping
    Digit
    Frequency bands
    Decoding
    Tuning
    Output
    Experiment
    Experiments
    Vocal Tract
    Length

    Keywords

    • Connected digit recognition
    • Feature extraction
    • Multilayer perceptron
    • Speaker canonicalization

    ASJC Scopus subject areas

    • Language and Linguistics
    • Human-Computer Interaction
    • Signal Processing
    • Software
    • Modelling and Simulation

    Cite this

    Kubota, Y., Omachi, M., Ogawa, T., Kobayashi, T., & Nitta, T. (2014). Effect of frequency weighting on MLP-based speaker canonicalization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (pp. 2987-2991). International Speech and Communication Association.

    Effect of frequency weighting on MLP-based speaker canonicalization. / Kubota, Yuichi; Omachi, Motoi; Ogawa, Tetsuji; Kobayashi, Tetsunori; Nitta, Tsuneo.

    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. p. 2987-2991.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kubota, Y, Omachi, M, Ogawa, T, Kobayashi, T & Nitta, T 2014, Effect of frequency weighting on MLP-based speaker canonicalization. in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp. 2987-2991, 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages, INTERSPEECH 2014, Singapore, Singapore, 14/9/14.
    Kubota Y, Omachi M, Ogawa T, Kobayashi T, Nitta T. Effect of frequency weighting on MLP-based speaker canonicalization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association. 2014. p. 2987-2991
    Kubota, Yuichi ; Omachi, Motoi ; Ogawa, Tetsuji ; Kobayashi, Tetsunori ; Nitta, Tsuneo. / Effect of frequency weighting on MLP-based speaker canonicalization. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, 2014. pp. 2987-2991
    @inproceedings{28bfdb2ff76b44fab8ef1560fef9e012,
    title = "Effect of frequency weighting on MLP-based speaker canonicalization",
    abstract = "Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.",
    keywords = "Connected digit recognition, Feature extraction, Multilayer perceptron, Speaker canonicalization",
    author = "Yuichi Kubota and Motoi Omachi and Tetsuji Ogawa and Tetsunori Kobayashi and Tsuneo Nitta",
    year = "2014",
    language = "English",
    pages = "2987--2991",
    booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
    publisher = "International Speech and Communication Association",

    }

    TY - GEN

    T1 - Effect of frequency weighting on MLP-based speaker canonicalization

    AU - Kubota, Yuichi

    AU - Omachi, Motoi

    AU - Ogawa, Tetsuji

    AU - Kobayashi, Tetsunori

    AU - Nitta, Tsuneo

    PY - 2014

    Y1 - 2014

    N2 - Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.

    AB - Accurate and efficient speaker canonicalization is proposed to improve the performance of speaker-independent ASR systems. Vocal tract length normalization (VTLN) is often applied to speaker canonicalization in ASR; however, it requires parallel decoding of speech when estimating the optimal warping parameter. In addition, VTLN provides the same linear spectral transformation in an utterance, although optimal mapping functions differ among phonemes. In this study, we propose a novel speaker canonicalization using multilayer perceptron (MLP) that is trained with a data set of vowels to map an input spectrum to the output spectrum of a standard speaker or a canonical speaker. The proposed speaker canonicalization operates according to the integration of MLP-based mapping and identity mapping that depends on frequency bands and achieves accurate recognition without any tuning of mapping function during run-time. Results of experiments conducted with a continuous digit recognition task showed that the proposed method reduces the intra-class variability in both of the vowel and consonant parts and outperforms VTLN.

    KW - Connected digit recognition

    KW - Feature extraction

    KW - Multilayer perceptron

    KW - Speaker canonicalization

    UR - http://www.scopus.com/inward/record.url?scp=84910027717&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84910027717&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:84910027717

    SP - 2987

    EP - 2991

    BT - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

    PB - International Speech and Communication Association

    ER -