Functional differences between vowel onsets and offsets in temporal perception of speech

Local-change detection and speaking-rate discrimination

Hiroaki Kato, Minoro Tsuzaki, Yoshinori Sagisaka

    Research output: Contribution to journalArticle

    13 Citations (Scopus)

    Abstract

    To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.

    Original languageEnglish
    Pages (from-to)3379-3389
    Number of pages11
    JournalJournal of the Acoustical Society of America
    Volume113
    Issue number6
    DOIs
    Publication statusPublished - 2003 Jun 1

    Fingerprint

    change detection
    vowels
    discrimination
    intervals
    stimuli
    marking
    manipulators
    Experiment
    Discrimination
    Listeners
    Change Detection
    Onset
    evaluation
    synthesis
    estimates

    ASJC Scopus subject areas

    • Acoustics and Ultrasonics

    Cite this

    @article{39ad02e0a5424b28a360e59bc85ef66e,
    title = "Functional differences between vowel onsets and offsets in temporal perception of speech: Local-change detection and speaking-rate discrimination",
    abstract = "To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5{\%}-5{\%} in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.",
    author = "Hiroaki Kato and Minoro Tsuzaki and Yoshinori Sagisaka",
    year = "2003",
    month = "6",
    day = "1",
    doi = "10.1121/1.1568760",
    language = "English",
    volume = "113",
    pages = "3379--3389",
    journal = "Journal of the Acoustical Society of America",
    issn = "0001-4966",
    publisher = "Acoustical Society of America",
    number = "6",

    }

    TY - JOUR

    T1 - Functional differences between vowel onsets and offsets in temporal perception of speech

    T2 - Local-change detection and speaking-rate discrimination

    AU - Kato, Hiroaki

    AU - Tsuzaki, Minoro

    AU - Sagisaka, Yoshinori

    PY - 2003/6/1

    Y1 - 2003/6/1

    N2 - To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.

    AB - To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.

    UR - http://www.scopus.com/inward/record.url?scp=0037945402&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0037945402&partnerID=8YFLogxK

    U2 - 10.1121/1.1568760

    DO - 10.1121/1.1568760

    M3 - Article

    VL - 113

    SP - 3379

    EP - 3389

    JO - Journal of the Acoustical Society of America

    JF - Journal of the Acoustical Society of America

    SN - 0001-4966

    IS - 6

    ER -