To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments.
ASJC Scopus subject areas