Tree-based unit selection for English speech synthesis

Wern Jun Wang, W. N. Campbell, Naoto Iwahashi, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

In concatenative speech synthesis for English, scarcity of speech data for many contexts is a serious problem. In this paper, we propose a new unit selection scheme using a decision-tree-based clustering method that combines acoustic and linguistic knowledge with statistical modeling. This approach not only allows us to find a trainable and consistent set of generalized allophonic models but also to achieve some local optimality with respect to the limited training data. To evaluate the validity of this algorithm, regression tree generation has been carried out for both vowels and consonants from 200 phonetically balanced sentences read by a female speaker. Experimental results show that regression trees offer a promising solution for the data scarcity problem.

Original languageEnglish
Title of host publicationProceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
Place of PublicationPiscataway, NJ, United States
PublisherPubl by IEEE
Volume2
ISBN (Print)0780309464
Publication statusPublished - 1993
Externally publishedYes
Event1993 IEEE International Conference on Acoustics, Speech and Signal Processing - Minneapolis, MN, USA
Duration: 1993 Apr 271993 Apr 30

Other

Other1993 IEEE International Conference on Acoustics, Speech and Signal Processing
CityMinneapolis, MN, USA
Period93/4/2793/4/30

    Fingerprint

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Acoustics and Ultrasonics

Cite this

Wang, W. J., Campbell, W. N., Iwahashi, N., & Sagisaka, Y. (1993). Tree-based unit selection for English speech synthesis. In Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing (Vol. 2). Publ by IEEE.