Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling

Yoko Greenberg, Nagisa Shibuya, Minoru Tsuzaki, Hiroaki Kato, Yoshinori Sagisaka

    Research output: Contribution to journalArticle

    10 Citations (Scopus)

    Abstract

    A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of "n" were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of "n" as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of "n", multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.

    Original languageEnglish
    Pages (from-to)585-593
    Number of pages9
    JournalSpeech Communication
    Volume51
    Issue number7
    DOIs
    Publication statusPublished - 2009 Jul

    Fingerprint

    Prosody
    scaling
    Scaling
    Categorical
    Specification
    Specifications
    Speech Synthesis
    Speech synthesis
    Distance Matrix
    Psychometrics
    psychometrics
    conversation
    Correspondence
    Paralinguistics
    Cover
    Three-dimensional
    communication
    Communication
    Experimental Results
    Speech

    Keywords

    • Communicative speech synthesis
    • Fundamental frequency control
    • Nonverbal information
    • Paralinguistic prosody

    ASJC Scopus subject areas

    • Modelling and Simulation
    • Computer Science Applications
    • Computer Vision and Pattern Recognition
    • Software
    • Communication
    • Linguistics and Language

    Cite this

    Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling. / Greenberg, Yoko; Shibuya, Nagisa; Tsuzaki, Minoru; Kato, Hiroaki; Sagisaka, Yoshinori.

    In: Speech Communication, Vol. 51, No. 7, 07.2009, p. 585-593.

    Research output: Contribution to journalArticle

    Greenberg, Yoko ; Shibuya, Nagisa ; Tsuzaki, Minoru ; Kato, Hiroaki ; Sagisaka, Yoshinori. / Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling. In: Speech Communication. 2009 ; Vol. 51, No. 7. pp. 585-593.
    @article{9bbc87992a0948248d688f49e3fb56ff,
    title = "Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling",
    abstract = "A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of {"}n{"} were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of {"}n{"} as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of {"}n{"}, multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.",
    keywords = "Communicative speech synthesis, Fundamental frequency control, Nonverbal information, Paralinguistic prosody",
    author = "Yoko Greenberg and Nagisa Shibuya and Minoru Tsuzaki and Hiroaki Kato and Yoshinori Sagisaka",
    year = "2009",
    month = "7",
    doi = "10.1016/j.specom.2007.10.006",
    language = "English",
    volume = "51",
    pages = "585--593",
    journal = "Speech Communication",
    issn = "0167-6393",
    publisher = "Elsevier",
    number = "7",

    }

    TY - JOUR

    T1 - Analysis on paralinguistic prosody control in perceptual impression space using multiple dimensional scaling

    AU - Greenberg, Yoko

    AU - Shibuya, Nagisa

    AU - Tsuzaki, Minoru

    AU - Kato, Hiroaki

    AU - Sagisaka, Yoshinori

    PY - 2009/7

    Y1 - 2009/7

    N2 - A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of "n" were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of "n" as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of "n", multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.

    AB - A multi-dimensional perceptual space for communicative speech prosodies was derived using a psychometric method from multi-dimensional expressions of impressions to characterize paralinguistic information conveyed by prosody in communication. Single word utterances of "n" were employed to allow freedom from lexical effects and to cover communicative prosodic variations as much as possible. The analysis of daily conversations showed that conversational speech impressions were manifested in the global F0 control of "n" as differences of average height (high-low) and dynamic patterns (rise, fall, gradual fall, and rise&fall). Using controlled single utterances of "n", multiple dimensional scaling analysis was applied to a mutual distance matrix obtained by 26 dimensional vectors expressing perceptual impressions. The result showed the three-dimensional structure of a perceptual impression space, and each dimension corresponded to different F0 control characteristics. The positive-negative impression can be controlled by average F0 height while confident-doubtful or allowable-unacceptable impressions can be controlled by F0 dynamic patterns. Unlike conventional categorical classification of prosodic patterns frequently observed in studies of emotional prosody, this control characterization enables us to flexibly and quantitatively describe prosodic impressions. These experimental results allow the possibility of input specifications for communicative prosody generation using impression vectors and control through average F0 height and F0 dynamic patterns. Instead of the generation of speech with categorical prototypical prosody, more adequate communicative speech synthesis can be approached through input specification and its correspondence with control characteristics.

    KW - Communicative speech synthesis

    KW - Fundamental frequency control

    KW - Nonverbal information

    KW - Paralinguistic prosody

    UR - http://www.scopus.com/inward/record.url?scp=67349265582&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=67349265582&partnerID=8YFLogxK

    U2 - 10.1016/j.specom.2007.10.006

    DO - 10.1016/j.specom.2007.10.006

    M3 - Article

    AN - SCOPUS:67349265582

    VL - 51

    SP - 585

    EP - 593

    JO - Speech Communication

    JF - Speech Communication

    SN - 0167-6393

    IS - 7

    ER -