Generation and perception of F0 markedness for communicative speech synthesis

Yoshinori Sagisaka, Takumi Yamashita, Yoko Kokenawa

    Research output: Contribution to journalArticle

    17 Citations (Scopus)

    Abstract

    Aiming at natural F0 control for conversational speech synthesis using attributes of constituent output words, F0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F0 characteristics. The comparison showed the consistent F0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F0 heights. Finally, a computational model of conversational F0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F0 estimation experiments quantitatively showed the possibility of F0 control for natural conversational speech synthesis using the attribute of constituent output words.

    Original languageEnglish
    Pages (from-to)376-384
    Number of pages9
    JournalSpeech Communication
    Volume46
    Issue number3-4
    DOIs
    Publication statusPublished - 2005 Jul

    Fingerprint

    Speech Synthesis
    Speech synthesis
    Attribute
    test evaluation
    Output
    Computational Model
    Perception
    Adverb
    Markedness
    experiment
    Evaluation
    Experiment
    Experiments
    Dependency (Psychology)

    Keywords

    • Computational prosody modeling
    • Conversational speech prosody
    • Corpus-based speech synthesis
    • Fundamental frequency control
    • Perceptual markedness

    ASJC Scopus subject areas

    • Signal Processing
    • Electrical and Electronic Engineering
    • Experimental and Cognitive Psychology
    • Linguistics and Language

    Cite this

    Generation and perception of F0 markedness for communicative speech synthesis. / Sagisaka, Yoshinori; Yamashita, Takumi; Kokenawa, Yoko.

    In: Speech Communication, Vol. 46, No. 3-4, 07.2005, p. 376-384.

    Research output: Contribution to journalArticle

    Sagisaka, Yoshinori ; Yamashita, Takumi ; Kokenawa, Yoko. / Generation and perception of F0 markedness for communicative speech synthesis. In: Speech Communication. 2005 ; Vol. 46, No. 3-4. pp. 376-384.
    @article{90f9b3c8affa4d60b6b63eac2780349c,
    title = "Generation and perception of F0 markedness for communicative speech synthesis",
    abstract = "Aiming at natural F0 control for conversational speech synthesis using attributes of constituent output words, F0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F0 characteristics. The comparison showed the consistent F0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F0 heights. Finally, a computational model of conversational F0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F0 estimation experiments quantitatively showed the possibility of F0 control for natural conversational speech synthesis using the attribute of constituent output words.",
    keywords = "Computational prosody modeling, Conversational speech prosody, Corpus-based speech synthesis, Fundamental frequency control, Perceptual markedness",
    author = "Yoshinori Sagisaka and Takumi Yamashita and Yoko Kokenawa",
    year = "2005",
    month = "7",
    doi = "10.1016/j.specom.2005.03.017",
    language = "English",
    volume = "46",
    pages = "376--384",
    journal = "Speech Communication",
    issn = "0167-6393",
    publisher = "Elsevier",
    number = "3-4",

    }

    TY - JOUR

    T1 - Generation and perception of F0 markedness for communicative speech synthesis

    AU - Sagisaka, Yoshinori

    AU - Yamashita, Takumi

    AU - Kokenawa, Yoko

    PY - 2005/7

    Y1 - 2005/7

    N2 - Aiming at natural F0 control for conversational speech synthesis using attributes of constituent output words, F0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F0 characteristics. The comparison showed the consistent F0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F0 heights. Finally, a computational model of conversational F0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F0 estimation experiments quantitatively showed the possibility of F0 control for natural conversational speech synthesis using the attribute of constituent output words.

    AB - Aiming at natural F0 control for conversational speech synthesis using attributes of constituent output words, F0 characteristics are analyzed from both generation and perception viewpoints. We recorded commonly used two-phrase utterances consisting of Japanese adjective and adverb phrases expressing different degree of markedness under designed conversational situations, and compared their F0 characteristics. The comparison showed the consistent F0 control dependencies not only on adverbs themselves but also on the attribute of following adjective phrases. Strong positive or negative correlation is observed between the markedness of adverbs and F0 height when an adjective phrase showing positiveness or negativeness is followed to the current adverb phrase. These consistencies have been perceptually confirmed by naturalness evaluation tests using the same two-phrase samples with different F0 heights. Finally, a computational model of conversational F0 control is proposed using lexical information of adjectives showing positiveness or negativeness and adverbs expressing markedness. F0 estimation experiments quantitatively showed the possibility of F0 control for natural conversational speech synthesis using the attribute of constituent output words.

    KW - Computational prosody modeling

    KW - Conversational speech prosody

    KW - Corpus-based speech synthesis

    KW - Fundamental frequency control

    KW - Perceptual markedness

    UR - http://www.scopus.com/inward/record.url?scp=21844444303&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=21844444303&partnerID=8YFLogxK

    U2 - 10.1016/j.specom.2005.03.017

    DO - 10.1016/j.specom.2005.03.017

    M3 - Article

    VL - 46

    SP - 376

    EP - 384

    JO - Speech Communication

    JF - Speech Communication

    SN - 0167-6393

    IS - 3-4

    ER -