Automatic extraction of fundamental frequency control rules by statistical analysis

Toshio Hirai, Naoto Iwahashi, Norio Higuchi, Yoshinori Sagisaka

Research output: Contribution to journalArticle

Abstract

This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.

Original languageEnglish
Pages (from-to)91-100
Number of pages10
JournalSystems and Computers in Japan
Volume28
Issue number3
Publication statusPublished - 1997 Mar
Externally publishedYes

Fingerprint

Fundamental Frequency
Statistical Analysis
Statistical methods
Time series
Refinement
Die
Sufficient
Decrease
Estimate
Speech
Language
Model

Keywords

  • Fujisaki model
  • Speech synthesis
  • Statistical analysis
  • Superpositional fundamental frequency control model
  • Voice fundamental frequency control

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Hardware and Architecture
  • Information Systems
  • Theoretical Computer Science

Cite this

Automatic extraction of fundamental frequency control rules by statistical analysis. / Hirai, Toshio; Iwahashi, Naoto; Higuchi, Norio; Sagisaka, Yoshinori.

In: Systems and Computers in Japan, Vol. 28, No. 3, 03.1997, p. 91-100.

Research output: Contribution to journalArticle

Hirai, Toshio ; Iwahashi, Naoto ; Higuchi, Norio ; Sagisaka, Yoshinori. / Automatic extraction of fundamental frequency control rules by statistical analysis. In: Systems and Computers in Japan. 1997 ; Vol. 28, No. 3. pp. 91-100.
@article{1369c9066ac648deb76f4397b3c011a2,
title = "Automatic extraction of fundamental frequency control rules by statistical analysis",
abstract = "This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.",
keywords = "Fujisaki model, Speech synthesis, Statistical analysis, Superpositional fundamental frequency control model, Voice fundamental frequency control",
author = "Toshio Hirai and Naoto Iwahashi and Norio Higuchi and Yoshinori Sagisaka",
year = "1997",
month = "3",
language = "English",
volume = "28",
pages = "91--100",
journal = "Systems and Computers in Japan",
issn = "0882-1666",
publisher = "John Wiley and Sons Inc.",
number = "3",

}

TY - JOUR

T1 - Automatic extraction of fundamental frequency control rules by statistical analysis

AU - Hirai, Toshio

AU - Iwahashi, Naoto

AU - Higuchi, Norio

AU - Sagisaka, Yoshinori

PY - 1997/3

Y1 - 1997/3

N2 - This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.

AB - This paper aims at the improvement of the naturalness of Japanese synthetic speech and proposes a method of extracting automatically the rules for controlling the voice fundamental frequency (written as F0). The proposed method is composed of two steps. (1) The F0 time-series pattern of a sufficient amount of speech data is represented by parameters under Fujisaki's model. (2) The F0 control rule that estimates die parameter values from the language information is extracted by statistical analysis. The proposed method is applied to 200 Japanese sentences read by a speaker, and the relation between the language information and the parameter values is derived by analyzing the obtained F0 control rules. The following properties are identified. (1) The phrase command diminishes when the number of morae in the preceding phrase decreases. (2) The accent component is reduced when the number of morae in the higher-pitched part of the accented phrase is larger. Relation (2) is a refinement of knowledge already obtained by the analysis of a small number of samples. Thus, it is shown that adequate F0 control rules can be extracted automatically by the proposed method.

KW - Fujisaki model

KW - Speech synthesis

KW - Statistical analysis

KW - Superpositional fundamental frequency control model

KW - Voice fundamental frequency control

UR - http://www.scopus.com/inward/record.url?scp=0031084194&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031084194&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0031084194

VL - 28

SP - 91

EP - 100

JO - Systems and Computers in Japan

JF - Systems and Computers in Japan

SN - 0882-1666

IS - 3

ER -