GENERATION OF PROSODY IN SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

Naohiro Sakurai, Takemi Mochida, Tetsunori Kobayashi, Katsuhiko Shirai

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)

Abstract

In order to improve the naturalness of synthetic speech in Japanese text-to-speech or concept-to-speech conversion, we introduce a new scheme to synthesize arbitrary speech sentences using the natural sentence speech data-base. In our synthesis method, a series of synthetic parameters is generated using patterns which are extracted from natural speech waveforms. In the first step, the basic sentence is selected from the data-base against a target sentence. The factors for the selection are phrase dependency structure(separation degree), number of mora, type of accent and phonemic labels. In the second step, if necessary, the basic accent-phrase is selected from the same data-base against the each target, accent-phrase. The factors considered in selecting the each accent-phrase are the separation degree, the number of mora, the type of accent and the phonemic labels. In the third step, pitch pattern is generated from those waveform units selected in the first and the second step. In the last step, the phonemic parameters are generated. These phonemic parameters for several morae are extracted on the former three steps. Therefore, in this step, we only have to replace the phonemic parameters for ill-suited morae. As the pitch pattern is generated using patterns directly extracted from real speech, it is expected to be more natural than any other pattern which is estimated by any model. We have examined this method on Japanese sentence speech to the present and affirmed that the synthetic sound preserves human-like features fairly well.

Original languageEnglish
Pages747-750
Number of pages4
Publication statusPublished - 1994
Event3rd International Conference on Spoken Language Processing, ICSLP 1994 - Yokohama, Japan
Duration: 1994 Sep 181994 Sep 22

Conference

Conference3rd International Conference on Spoken Language Processing, ICSLP 1994
Country/TerritoryJapan
CityYokohama
Period94/9/1894/9/22

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'GENERATION OF PROSODY IN SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE'. Together they form a unique fingerprint.

Cite this