F0 analysis for Japanese conversational speech synthesis

Hideharu Nakajima, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper proposes a conversational style text-to-speech synthesis scheme based on an analysis of fundamental frequency, F0. Through the analysis, we confirm that conversational F0 can be represented by the superpositional model using three components ranging utterance, major phrase, and minor phrase. We compare each component of the model between conversational style and reading style to investigate the following points: where big F0 discrepancies are found, what linguistic factors concern to the discrepancies, and to what extent do such discrepancies occur. This paper uses real domain data that includes a lot of linguistic context. Analysis confirms that large differences occur in global components such as single span whole utterances and phrases, and that the differences occur at or around domain-specific expressions. The analysis also reveals that local components are almost the same in both styles. These analyses show that it is necessary to estimate the utterance and phrase components from words attributes other than the grammatical clues to realize conversational synthesis in the super positional manner.

Original languageEnglish
Title of host publication2009 8th International Symposium on Natural Language Processing, SNLP '09
Pages137-142
Number of pages6
DOIs
Publication statusPublished - 2009 Dec 28
Event2009 8th International Symposium on Natural Language Processing, SNLP '09 - Bangkok, Thailand
Duration: 2009 Oct 202009 Oct 22

Publication series

Name2009 8th International Symposium on Natural Language Processing, SNLP '09

Conference

Conference2009 8th International Symposium on Natural Language Processing, SNLP '09
CountryThailand
CityBangkok
Period09/10/2009/10/22

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Software

Fingerprint Dive into the research topics of 'F0 analysis for Japanese conversational speech synthesis'. Together they form a unique fingerprint.

  • Cite this

    Nakajima, H., & Sagisaka, Y. (2009). F0 analysis for Japanese conversational speech synthesis. In 2009 8th International Symposium on Natural Language Processing, SNLP '09 (pp. 137-142). [5340932] (2009 8th International Symposium on Natural Language Processing, SNLP '09). https://doi.org/10.1109/SNLP.2009.5340932