Statistical modelling of speech segment duration by constrained tree regression

Naoto Iwahashi, Yoshinori Sagisaka

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.

Original languageEnglish
Pages (from-to)15501559
Number of pages1
JournalIEICE Transactions on Information and Systems
VolumeE83-D
Issue number7
Publication statusPublished - 2000
Externally publishedYes

Fingerprint

Speech synthesis

Keywords

  • Regression
  • Speech segmental duration
  • Statistical modelling

ASJC Scopus subject areas

  • Information Systems
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Statistical modelling of speech segment duration by constrained tree regression. / Iwahashi, Naoto; Sagisaka, Yoshinori.

In: IEICE Transactions on Information and Systems, Vol. E83-D, No. 7, 2000, p. 15501559.

Research output: Contribution to journalArticle

@article{bf65a58cded74842865bfca51f32d65e,
title = "Statistical modelling of speech segment duration by constrained tree regression",
abstract = "This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.",
keywords = "Regression, Speech segmental duration, Statistical modelling",
author = "Naoto Iwahashi and Yoshinori Sagisaka",
year = "2000",
language = "English",
volume = "E83-D",
pages = "15501559",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "7",

}

TY - JOUR

T1 - Statistical modelling of speech segment duration by constrained tree regression

AU - Iwahashi, Naoto

AU - Sagisaka, Yoshinori

PY - 2000

Y1 - 2000

N2 - This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.

AB - This paper presents a new method for statistical modelling of prosody control in speech synthesis. The proposed method, which is referred to as Constrained Tree Regression (CTR), can make suitable representation of complex effects of control factors for prosody with a moderate amount of learning data. It is based on recursive splits of predictor variable spaces and partial imposition of constraints of linear independence among predictor variables. It incorporates both linear and tree regressions with categorical predictor variables, which have been conventionally used for prosody control, and extends them to more general models. In addition, a hierarchical error function is presented to consider hierarchical structure in prosody control. This new method is applied to modelling of speech segmental duration. Experimental results show that better duration models are obtained by using the proposed regression method compared with linear and tree regressions using the same number of free parameters. It is also shown that the hierarchical structure of phoneme and syllable durations can be represented efficiently using the hierarchical error function.

KW - Regression

KW - Speech segmental duration

KW - Statistical modelling

UR - http://www.scopus.com/inward/record.url?scp=33746384049&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33746384049&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:33746384049

VL - E83-D

SP - 15501559

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 7

ER -