Effect of intra-phrase position on acceptability of change in segment duration in sentence speech

Makiko Muto, Hiroaki Kato, Minoru Tsuzaki, Yoshinori Sagisaka

    Research output: Contribution to journalArticle

    2 Citations (Scopus)

    Abstract

    For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.

    Original languageEnglish
    Pages (from-to)361-372
    Number of pages12
    JournalSpeech Communication
    Volume45
    Issue number4
    DOIs
    Publication statusPublished - 2005 Apr

    Fingerprint

    Sodium Glutamate
    Speech synthesis
    listener
    Speech Synthesis
    stimulus
    Group
    Speech
    Carrier
    Acceptability
    experiment
    Experiments
    Attribute
    Dependent
    Evaluate
    Length
    Modeling
    Experiment
    Accent
    Naturalness

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering
    • Experimental and Cognitive Psychology
    • Linguistics and Language

    Cite this

    Effect of intra-phrase position on acceptability of change in segment duration in sentence speech. / Muto, Makiko; Kato, Hiroaki; Tsuzaki, Minoru; Sagisaka, Yoshinori.

    In: Speech Communication, Vol. 45, No. 4, 04.2005, p. 361-372.

    Research output: Contribution to journalArticle

    Muto, Makiko ; Kato, Hiroaki ; Tsuzaki, Minoru ; Sagisaka, Yoshinori. / Effect of intra-phrase position on acceptability of change in segment duration in sentence speech. In: Speech Communication. 2005 ; Vol. 45, No. 4. pp. 361-372.
    @article{bb0fc7327ae84259b4febe8560c64cea,
    title = "Effect of intra-phrase position on acceptability of change in segment duration in sentence speech",
    abstract = "For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.",
    author = "Makiko Muto and Hiroaki Kato and Minoru Tsuzaki and Yoshinori Sagisaka",
    year = "2005",
    month = "4",
    doi = "10.1016/j.specom.2004.11.004",
    language = "English",
    volume = "45",
    pages = "361--372",
    journal = "Speech Communication",
    issn = "0167-6393",
    publisher = "Elsevier",
    number = "4",

    }

    TY - JOUR

    T1 - Effect of intra-phrase position on acceptability of change in segment duration in sentence speech

    AU - Muto, Makiko

    AU - Kato, Hiroaki

    AU - Tsuzaki, Minoru

    AU - Sagisaka, Yoshinori

    PY - 2005/4

    Y1 - 2005/4

    N2 - For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.

    AB - For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.

    UR - http://www.scopus.com/inward/record.url?scp=15844399704&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=15844399704&partnerID=8YFLogxK

    U2 - 10.1016/j.specom.2004.11.004

    DO - 10.1016/j.specom.2004.11.004

    M3 - Article

    AN - SCOPUS:15844399704

    VL - 45

    SP - 361

    EP - 372

    JO - Speech Communication

    JF - Speech Communication

    SN - 0167-6393

    IS - 4

    ER -