Speech segment network approach for optimization of synthesis unit set

Naoto Iwahashi, Yoshinori Sagisaka

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In this paper, a speech segment network approach for the construction of a suitable synthesis unit set with which high-quality speech can be synthesized, and yet which is of small enough size to be practical, is proposed. The speech segment network approach selects a synthesis unit set in which segmental and/or inter-segmental distortions are minimized by using combinatorial optimization methods such as iterative improvement and simulated annealing. Experimental results using diphone segments have shown that the suitable diphone unit sets, with total or maximum of inter-segmental distortion reduced by about 35 and 30%, respectively, can be constructed using this method. This reduction rate was enhanced as the segment candidate population increased. Effectiveness of this unit set design was also perceptually confirmed by a listening test, using speech synthesized with the selected diphone unit set.

Original languageEnglish
Pages (from-to)335-352
Number of pages18
JournalComputer Speech and Language
Volume9
Issue number4
DOIs
Publication statusPublished - 1995 Oct
Externally publishedYes

Fingerprint

Synthesis
Unit
Optimization
Combinatorial optimization
Simulated annealing
Combinatorial Optimization
candidacy
Simulated Annealing
Optimization Methods
Speech
Population
Experimental Results

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language

Cite this

Speech segment network approach for optimization of synthesis unit set. / Iwahashi, Naoto; Sagisaka, Yoshinori.

In: Computer Speech and Language, Vol. 9, No. 4, 10.1995, p. 335-352.

Research output: Contribution to journalArticle

@article{b1eabc975c3c4e5d9d347b308511c176,
title = "Speech segment network approach for optimization of synthesis unit set",
abstract = "In this paper, a speech segment network approach for the construction of a suitable synthesis unit set with which high-quality speech can be synthesized, and yet which is of small enough size to be practical, is proposed. The speech segment network approach selects a synthesis unit set in which segmental and/or inter-segmental distortions are minimized by using combinatorial optimization methods such as iterative improvement and simulated annealing. Experimental results using diphone segments have shown that the suitable diphone unit sets, with total or maximum of inter-segmental distortion reduced by about 35 and 30{\%}, respectively, can be constructed using this method. This reduction rate was enhanced as the segment candidate population increased. Effectiveness of this unit set design was also perceptually confirmed by a listening test, using speech synthesized with the selected diphone unit set.",
author = "Naoto Iwahashi and Yoshinori Sagisaka",
year = "1995",
month = "10",
doi = "10.1006/csla.1995.0016",
language = "English",
volume = "9",
pages = "335--352",
journal = "Computer Speech and Language",
issn = "0885-2308",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Speech segment network approach for optimization of synthesis unit set

AU - Iwahashi, Naoto

AU - Sagisaka, Yoshinori

PY - 1995/10

Y1 - 1995/10

N2 - In this paper, a speech segment network approach for the construction of a suitable synthesis unit set with which high-quality speech can be synthesized, and yet which is of small enough size to be practical, is proposed. The speech segment network approach selects a synthesis unit set in which segmental and/or inter-segmental distortions are minimized by using combinatorial optimization methods such as iterative improvement and simulated annealing. Experimental results using diphone segments have shown that the suitable diphone unit sets, with total or maximum of inter-segmental distortion reduced by about 35 and 30%, respectively, can be constructed using this method. This reduction rate was enhanced as the segment candidate population increased. Effectiveness of this unit set design was also perceptually confirmed by a listening test, using speech synthesized with the selected diphone unit set.

AB - In this paper, a speech segment network approach for the construction of a suitable synthesis unit set with which high-quality speech can be synthesized, and yet which is of small enough size to be practical, is proposed. The speech segment network approach selects a synthesis unit set in which segmental and/or inter-segmental distortions are minimized by using combinatorial optimization methods such as iterative improvement and simulated annealing. Experimental results using diphone segments have shown that the suitable diphone unit sets, with total or maximum of inter-segmental distortion reduced by about 35 and 30%, respectively, can be constructed using this method. This reduction rate was enhanced as the segment candidate population increased. Effectiveness of this unit set design was also perceptually confirmed by a listening test, using speech synthesized with the selected diphone unit set.

UR - http://www.scopus.com/inward/record.url?scp=0029386592&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029386592&partnerID=8YFLogxK

U2 - 10.1006/csla.1995.0016

DO - 10.1006/csla.1995.0016

M3 - Article

AN - SCOPUS:0029386592

VL - 9

SP - 335

EP - 352

JO - Computer Speech and Language

JF - Computer Speech and Language

SN - 0885-2308

IS - 4

ER -