TY - JOUR
T1 - Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling
AU - Greenberg, Yoko
AU - Shibuya, Nagisa
AU - Tsuzaki, Minoru
AU - Kato, Hiroaki
AU - Sagisaka, Yoshinori
N1 - Funding Information:
This work was partly supported by the Waseda University RISE research project entitled the “Analysis and modeling of human mechanism in speech and language processing” and by a Grant-in-Aid for Scientific Research (B) No. 18300063 and (A) No. 16200016, Japan Society for the Promotion of Science.
PY - 2009/7
Y1 - 2009/7
N2 - A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of "n" were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of "n" as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of "n", multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.
AB - A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of "n" were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of "n" as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of "n", multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.
KW - Communicative speech synthesis
KW - Fundamental frequency control
KW - Nonverbal information
KW - Paralinguistic prosody
UR - http://www.scopus.com/inward/record.url?scp=67349265582&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67349265582&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2007.10.006
DO - 10.1016/j.specom.2007.10.006
M3 - Article
AN - SCOPUS:67349265582
SN - 0167-6393
VL - 51
SP - 585
EP - 593
JO - Speech Communication
JF - Speech Communication
IS - 7
ER -