Analysis of the phone level contributions to objective evaluation of english speech by non-natives

Yasuo Suzuki, Makiko Muto, Katsuhiko Shirai, Yoshinori Sagisaka

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Aiming at automatic estimation of naturalness in timing control of non-native's speech, we have analyzed the timing characteristics of non-native's speech to correlate with corresponding subjective naturalness evaluation scores given by native speakers. In addition to word level statistical characteristics showing the differences between natives and non-natives, we analyzed phone and syllable level statistics to attain an objective measure better fit to natives' judgments. An English speech corpus spoken by Japanese was collected with temporal naturalness judgments by natives. The analysis results showed that timing differences between natives and non-natives in average syllable durations, weak vowel durations and vowel duration of function words were highly correlated with natives' naturalness evaluations. A liner regression model and a regression tree model were employed to estimate naturalness evaluation score from differences between native's speech and non-natives one. The proposed naturalness evaluation model was tested its estimation accuracy using open data. The root mean square errors between predicted scores by the two models and scores given by the natives turned out to be 0.63 and 0.66 comparable to the differences 0.70 of scores among native listeners respectively. These accuracies were better than one estimated by the model using word statistics only.

    Original languageEnglish
    Title of host publication8th International Conference on Spoken Language Processing, ICSLP 2004
    PublisherInternational Speech Communication Association
    Pages1673-1676
    Number of pages4
    Publication statusPublished - 2004
    Event8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of
    Duration: 2004 Oct 42004 Oct 8

    Other

    Other8th International Conference on Spoken Language Processing, ICSLP 2004
    CountryKorea, Republic of
    CityJeju, Jeju Island
    Period04/10/404/10/8

    Fingerprint

    evaluation
    statistics
    regression
    Phone
    Naturalness
    Evaluation
    listener
    Vowel Duration
    Statistics

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language

    Cite this

    Suzuki, Y., Muto, M., Shirai, K., & Sagisaka, Y. (2004). Analysis of the phone level contributions to objective evaluation of english speech by non-natives. In 8th International Conference on Spoken Language Processing, ICSLP 2004 (pp. 1673-1676). International Speech Communication Association.

    Analysis of the phone level contributions to objective evaluation of english speech by non-natives. / Suzuki, Yasuo; Muto, Makiko; Shirai, Katsuhiko; Sagisaka, Yoshinori.

    8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. p. 1673-1676.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Suzuki, Y, Muto, M, Shirai, K & Sagisaka, Y 2004, Analysis of the phone level contributions to objective evaluation of english speech by non-natives. in 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, pp. 1673-1676, 8th International Conference on Spoken Language Processing, ICSLP 2004, Jeju, Jeju Island, Korea, Republic of, 04/10/4.
    Suzuki Y, Muto M, Shirai K, Sagisaka Y. Analysis of the phone level contributions to objective evaluation of english speech by non-natives. In 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association. 2004. p. 1673-1676
    Suzuki, Yasuo ; Muto, Makiko ; Shirai, Katsuhiko ; Sagisaka, Yoshinori. / Analysis of the phone level contributions to objective evaluation of english speech by non-natives. 8th International Conference on Spoken Language Processing, ICSLP 2004. International Speech Communication Association, 2004. pp. 1673-1676
    @inproceedings{74c14eaea7724dbe8e2d184c062e9cb1,
    title = "Analysis of the phone level contributions to objective evaluation of english speech by non-natives",
    abstract = "Aiming at automatic estimation of naturalness in timing control of non-native's speech, we have analyzed the timing characteristics of non-native's speech to correlate with corresponding subjective naturalness evaluation scores given by native speakers. In addition to word level statistical characteristics showing the differences between natives and non-natives, we analyzed phone and syllable level statistics to attain an objective measure better fit to natives' judgments. An English speech corpus spoken by Japanese was collected with temporal naturalness judgments by natives. The analysis results showed that timing differences between natives and non-natives in average syllable durations, weak vowel durations and vowel duration of function words were highly correlated with natives' naturalness evaluations. A liner regression model and a regression tree model were employed to estimate naturalness evaluation score from differences between native's speech and non-natives one. The proposed naturalness evaluation model was tested its estimation accuracy using open data. The root mean square errors between predicted scores by the two models and scores given by the natives turned out to be 0.63 and 0.66 comparable to the differences 0.70 of scores among native listeners respectively. These accuracies were better than one estimated by the model using word statistics only.",
    author = "Yasuo Suzuki and Makiko Muto and Katsuhiko Shirai and Yoshinori Sagisaka",
    year = "2004",
    language = "English",
    pages = "1673--1676",
    booktitle = "8th International Conference on Spoken Language Processing, ICSLP 2004",
    publisher = "International Speech Communication Association",

    }

    TY - GEN

    T1 - Analysis of the phone level contributions to objective evaluation of english speech by non-natives

    AU - Suzuki, Yasuo

    AU - Muto, Makiko

    AU - Shirai, Katsuhiko

    AU - Sagisaka, Yoshinori

    PY - 2004

    Y1 - 2004

    N2 - Aiming at automatic estimation of naturalness in timing control of non-native's speech, we have analyzed the timing characteristics of non-native's speech to correlate with corresponding subjective naturalness evaluation scores given by native speakers. In addition to word level statistical characteristics showing the differences between natives and non-natives, we analyzed phone and syllable level statistics to attain an objective measure better fit to natives' judgments. An English speech corpus spoken by Japanese was collected with temporal naturalness judgments by natives. The analysis results showed that timing differences between natives and non-natives in average syllable durations, weak vowel durations and vowel duration of function words were highly correlated with natives' naturalness evaluations. A liner regression model and a regression tree model were employed to estimate naturalness evaluation score from differences between native's speech and non-natives one. The proposed naturalness evaluation model was tested its estimation accuracy using open data. The root mean square errors between predicted scores by the two models and scores given by the natives turned out to be 0.63 and 0.66 comparable to the differences 0.70 of scores among native listeners respectively. These accuracies were better than one estimated by the model using word statistics only.

    AB - Aiming at automatic estimation of naturalness in timing control of non-native's speech, we have analyzed the timing characteristics of non-native's speech to correlate with corresponding subjective naturalness evaluation scores given by native speakers. In addition to word level statistical characteristics showing the differences between natives and non-natives, we analyzed phone and syllable level statistics to attain an objective measure better fit to natives' judgments. An English speech corpus spoken by Japanese was collected with temporal naturalness judgments by natives. The analysis results showed that timing differences between natives and non-natives in average syllable durations, weak vowel durations and vowel duration of function words were highly correlated with natives' naturalness evaluations. A liner regression model and a regression tree model were employed to estimate naturalness evaluation score from differences between native's speech and non-natives one. The proposed naturalness evaluation model was tested its estimation accuracy using open data. The root mean square errors between predicted scores by the two models and scores given by the natives turned out to be 0.63 and 0.66 comparable to the differences 0.70 of scores among native listeners respectively. These accuracies were better than one estimated by the model using word statistics only.

    UR - http://www.scopus.com/inward/record.url?scp=85009115805&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85009115805&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 1673

    EP - 1676

    BT - 8th International Conference on Spoken Language Processing, ICSLP 2004

    PB - International Speech Communication Association

    ER -