Effect of speaking rate on the acceptability of change in segment duration

Makiko Muto, Hiroaki Kato, Minoru Tsuzaki, Yoshinori Sagisaka

    Research output: Contribution to journalArticle

    3 Citations (Scopus)

    Abstract

    The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.

    Original languageEnglish
    Pages (from-to)277-289
    Number of pages13
    JournalSpeech Communication
    Volume47
    Issue number3
    DOIs
    Publication statusPublished - 2005 Nov

    Fingerprint

    speaking
    experiment
    Experiment
    Experiments
    Detectability
    Declination
    listener
    stimulus
    Acceptability
    Speech Synthesis
    Speech synthesis
    Language
    Speech
    Simplify
    Series
    Carrier
    Dependent
    Evaluate
    Processing
    language

    Keywords

    • Acceptability
    • Naturalness
    • Speaking rate
    • Speech perception
    • Temporal perception

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering
    • Experimental and Cognitive Psychology
    • Linguistics and Language

    Cite this

    Effect of speaking rate on the acceptability of change in segment duration. / Muto, Makiko; Kato, Hiroaki; Tsuzaki, Minoru; Sagisaka, Yoshinori.

    In: Speech Communication, Vol. 47, No. 3, 11.2005, p. 277-289.

    Research output: Contribution to journalArticle

    Muto, Makiko ; Kato, Hiroaki ; Tsuzaki, Minoru ; Sagisaka, Yoshinori. / Effect of speaking rate on the acceptability of change in segment duration. In: Speech Communication. 2005 ; Vol. 47, No. 3. pp. 277-289.
    @article{e2b97a9e61af460aa43352d2defd46f9,
    title = "Effect of speaking rate on the acceptability of change in segment duration",
    abstract = "The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.",
    keywords = "Acceptability, Naturalness, Speaking rate, Speech perception, Temporal perception",
    author = "Makiko Muto and Hiroaki Kato and Minoru Tsuzaki and Yoshinori Sagisaka",
    year = "2005",
    month = "11",
    doi = "10.1016/j.specom.2005.02.012",
    language = "English",
    volume = "47",
    pages = "277--289",
    journal = "Speech Communication",
    issn = "0167-6393",
    publisher = "Elsevier",
    number = "3",

    }

    TY - JOUR

    T1 - Effect of speaking rate on the acceptability of change in segment duration

    AU - Muto, Makiko

    AU - Kato, Hiroaki

    AU - Tsuzaki, Minoru

    AU - Sagisaka, Yoshinori

    PY - 2005/11

    Y1 - 2005/11

    N2 - The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.

    AB - The acceptability of changes in segment duration at different speaking rates is studied to find useful perceptual characteristics for designing an objective naturalness measure in speech synthesis. Based on a series of previous studies on the intra-phrase positional dependency of perceptual acceptability, we investigate three factors: (1) speaking rate, (2) position within a phrase, and (3) presence/absence of a carrier sentence using three-mora (three-syllable) phrases at three rates (fast, normal and slow) with or without a carrier sentence (Experiment 1). Seven listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Moreover, to understand the observed results within a psychophysical or auditory-based framework instead of language-dependent features, we simplify and replicate the temporal structures of the speech stimuli used and investigate the corresponding three factors (Experiment 2). Ten listeners rate the difference between standard and comparison stimuli in which one of the duration was either lengthened or shortened by up to 40 ms. The speech experiment shows that the acceptability for the same amount of absolute change decreased with an increase in speaking rate, i.e., the listeners more sensitively responded to the same absolute duration change when the speaking rate was fast than when it was slow. Similarly, the non-speech experiment shows that the detectability for the same amount of absolute change increased with an increase in tempo. In addition, the speech experiment shows the differences in acceptability declinations due to intra-phrase positions at three speaking rates. Similarly, the non-speech experiment shows the differences in the detectability due to temporal positions at three tempi. These agreements between the speech and non-speech experiments suggest that the two experiments share a common perceptual mechanism in processing temporal differences. On the other hand, the speech experiment shows no consistent tendency of the acceptability declinations due to the presence/absence of a carrier sentence, while the non-speech experiment shows, in several cases, that the presence of a carrier context could lower the detectability.

    KW - Acceptability

    KW - Naturalness

    KW - Speaking rate

    KW - Speech perception

    KW - Temporal perception

    UR - http://www.scopus.com/inward/record.url?scp=26444513612&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=26444513612&partnerID=8YFLogxK

    U2 - 10.1016/j.specom.2005.02.012

    DO - 10.1016/j.specom.2005.02.012

    M3 - Article

    AN - SCOPUS:26444513612

    VL - 47

    SP - 277

    EP - 289

    JO - Speech Communication

    JF - Speech Communication

    SN - 0167-6393

    IS - 3

    ER -