Concatenative speech synthesis by minimum distortion criteria

Naoto Iwahashi, Nobuyoshi Kaiki, Yoshinori Sagisaka

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

A new scheme is proposed for concatenative speech synthesis to improve the segment selection procedure by minimizing acoustic distortions between the selected segment and the desired spectrum for the target. The spectral pro-totypicality of a segment, the spectral difference between the source and target contexts, the degradation resulting from concatenation of phonemes, and the acoustic continuity between the concatenated segments are all considered as measures. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization is used for distortion minimization. A perceptual test shows that contextual spectral difference and acoustic continuity at the segment boundary are important measures for improving the quality of synthesized speech.

Original languageEnglish
Title of host publicationICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages65-68
Number of pages4
Volume2
ISBN (Electronic)0780305329
DOIs
Publication statusPublished - 1992
Externally publishedYes
Event1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992 - San Francisco, United States
Duration: 1992 Mar 231992 Mar 26

Other

Other1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992
CountryUnited States
CitySan Francisco
Period92/3/2392/3/26

Fingerprint

Speech synthesis
Acoustic distortion
Acoustics
Degradation

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Iwahashi, N., Kaiki, N., & Sagisaka, Y. (1992). Concatenative speech synthesis by minimum distortion criteria. In ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing (Vol. 2, pp. 65-68). [226119] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.1992.226119

Concatenative speech synthesis by minimum distortion criteria. / Iwahashi, Naoto; Kaiki, Nobuyoshi; Sagisaka, Yoshinori.

ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 1992. p. 65-68 226119.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Iwahashi, N, Kaiki, N & Sagisaka, Y 1992, Concatenative speech synthesis by minimum distortion criteria. in ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. vol. 2, 226119, Institute of Electrical and Electronics Engineers Inc., pp. 65-68, 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1992, San Francisco, United States, 92/3/23. https://doi.org/10.1109/ICASSP.1992.226119
Iwahashi N, Kaiki N, Sagisaka Y. Concatenative speech synthesis by minimum distortion criteria. In ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2. Institute of Electrical and Electronics Engineers Inc. 1992. p. 65-68. 226119 https://doi.org/10.1109/ICASSP.1992.226119
Iwahashi, Naoto ; Kaiki, Nobuyoshi ; Sagisaka, Yoshinori. / Concatenative speech synthesis by minimum distortion criteria. ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing. Vol. 2 Institute of Electrical and Electronics Engineers Inc., 1992. pp. 65-68
@inproceedings{4fafb1a9b24e44aa9832ce153f4bb733,
title = "Concatenative speech synthesis by minimum distortion criteria",
abstract = "A new scheme is proposed for concatenative speech synthesis to improve the segment selection procedure by minimizing acoustic distortions between the selected segment and the desired spectrum for the target. The spectral pro-totypicality of a segment, the spectral difference between the source and target contexts, the degradation resulting from concatenation of phonemes, and the acoustic continuity between the concatenated segments are all considered as measures. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization is used for distortion minimization. A perceptual test shows that contextual spectral difference and acoustic continuity at the segment boundary are important measures for improving the quality of synthesized speech.",
author = "Naoto Iwahashi and Nobuyoshi Kaiki and Yoshinori Sagisaka",
year = "1992",
doi = "10.1109/ICASSP.1992.226119",
language = "English",
volume = "2",
pages = "65--68",
booktitle = "ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Concatenative speech synthesis by minimum distortion criteria

AU - Iwahashi, Naoto

AU - Kaiki, Nobuyoshi

AU - Sagisaka, Yoshinori

PY - 1992

Y1 - 1992

N2 - A new scheme is proposed for concatenative speech synthesis to improve the segment selection procedure by minimizing acoustic distortions between the selected segment and the desired spectrum for the target. The spectral pro-totypicality of a segment, the spectral difference between the source and target contexts, the degradation resulting from concatenation of phonemes, and the acoustic continuity between the concatenated segments are all considered as measures. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization is used for distortion minimization. A perceptual test shows that contextual spectral difference and acoustic continuity at the segment boundary are important measures for improving the quality of synthesized speech.

AB - A new scheme is proposed for concatenative speech synthesis to improve the segment selection procedure by minimizing acoustic distortions between the selected segment and the desired spectrum for the target. The spectral pro-totypicality of a segment, the spectral difference between the source and target contexts, the degradation resulting from concatenation of phonemes, and the acoustic continuity between the concatenated segments are all considered as measures. A search method for selecting segments from a large speech database is also described. In this method, a three-step optimization is used for distortion minimization. A perceptual test shows that contextual spectral difference and acoustic continuity at the segment boundary are important measures for improving the quality of synthesized speech.

UR - http://www.scopus.com/inward/record.url?scp=85009071260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009071260&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.1992.226119

DO - 10.1109/ICASSP.1992.226119

M3 - Conference contribution

VL - 2

SP - 65

EP - 68

BT - ICASSP 1992 - 1992 International Conference on Acoustics, Speech, and Signal Processing

PB - Institute of Electrical and Electronics Engineers Inc.

ER -