Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

Naoto Iwahashi, Nobuyoshi Kaiki, Yoshinori Sagisaka

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypically of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.

Original languageEnglish
Pages (from-to)1942-1948
Number of pages7
JournalIEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
VolumeE76-A
Issue number11
Publication statusPublished - 1993 Nov
Externally publishedYes

Fingerprint

Acoustics
Synthesis
Acoustic distortion
Speech synthesis
Dynamic programming
Concatenation
Degradation
Discontinuity
Speech Synthesis
Target
Speech
Selection Procedures
Search Methods
Dynamic Programming
Heuristics
Minimise
Optimization

ASJC Scopus subject areas

  • Hardware and Architecture
  • Information Systems
  • Electrical and Electronic Engineering

Cite this

Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization. / Iwahashi, Naoto; Kaiki, Nobuyoshi; Sagisaka, Yoshinori.

In: IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E76-A, No. 11, 11.1993, p. 1942-1948.

Research output: Contribution to journalArticle

@article{4c4557147f9b4fa182f749af47c4de30,
title = "Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization",
abstract = "This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypically of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.",
author = "Naoto Iwahashi and Nobuyoshi Kaiki and Yoshinori Sagisaka",
year = "1993",
month = "11",
language = "English",
volume = "E76-A",
pages = "1942--1948",
journal = "IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences",
issn = "0916-8508",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "11",

}

TY - JOUR

T1 - Speech Segment Selection for Concatenative Synthesis Based on Spectral Distortion Minimization

AU - Iwahashi, Naoto

AU - Kaiki, Nobuyoshi

AU - Sagisaka, Yoshinori

PY - 1993/11

Y1 - 1993/11

N2 - This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypically of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.

AB - This paper proposes a new scheme for concatenative speech synthesis to improve the speech segment selection procedure. The proposed scheme selects a segment sequence for concatenation by minimizing acoustic distortions between the selected segment and the desired spectrum for the target without the use of heuristics. Four types of distortion, a) the spectral prototypically of a segment, b) the spectral difference between the source and target contexts, c) the degradation resulting from concatenation of phonemes, and d) the acoustic discontinuity between the concatenated segments, are formulated as acoustic quantities, and used as measures for minimization. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization using dynamic programming is used to minimize the four types of distortion. A perceptual test shows that this proposed segment selection method with minimum distortion criteria produces high quality synthesized speech, and that contextual spectral difference and acoustic discontinuity at the segment boundary are important measures for improving the quality.

UR - http://www.scopus.com/inward/record.url?scp=0027699809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027699809&partnerID=8YFLogxK

M3 - Article

VL - E76-A

SP - 1942

EP - 1948

JO - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

JF - IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

SN - 0916-8508

IS - 11

ER -