TY - JOUR
T1 - Speech spectrum transformation by speaker interpolation
AU - Ituahashi, Naoto
AU - Sagisaka, Yoshinori
N1 - Publisher Copyright:
© 1994 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 1994
Y1 - 1994
N2 - In this paper, we propose a speech spectrum transformation method by interpolating spectral patterns between pre-stored multiple speakers for speech synthesis. Tlie interpolation is carried out using spectral parameters such as cepstrum and log area ratio to generate new spectrum patterns. The spectral patterns can be transforined smoothly as tlie iiiterpolation ratio is gradually changed, aid speech iiidividualitg caii easily be controlled between interpolated speakers. Adaptation to a target speaker can be peilornied by this interpolatiou, which uses only a small amount of training data to generate a new speech spectrum sequence close to the target speaker's. An adaptation experiment was carried out in the case of using only one word spoken by the target. speaker for learning. It was shown that the distance between the target speaker's spect.rnm and the spectrum generated by tlie proposed iuterpolation method is reduced by about 40% compared with distance between tlie target speaker's spectrum and spectrum of tlie speaker closest to the target ainoiig pre-stored ones.
AB - In this paper, we propose a speech spectrum transformation method by interpolating spectral patterns between pre-stored multiple speakers for speech synthesis. Tlie interpolation is carried out using spectral parameters such as cepstrum and log area ratio to generate new spectrum patterns. The spectral patterns can be transforined smoothly as tlie iiiterpolation ratio is gradually changed, aid speech iiidividualitg caii easily be controlled between interpolated speakers. Adaptation to a target speaker can be peilornied by this interpolatiou, which uses only a small amount of training data to generate a new speech spectrum sequence close to the target speaker's. An adaptation experiment was carried out in the case of using only one word spoken by the target. speaker for learning. It was shown that the distance between the target speaker's spect.rnm and the spectrum generated by tlie proposed iuterpolation method is reduced by about 40% compared with distance between tlie target speaker's spectrum and spectrum of tlie speaker closest to the target ainoiig pre-stored ones.
UR - http://www.scopus.com/inward/record.url?scp=85064715894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064715894&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.1994.389256
DO - 10.1109/ICASSP.1994.389256
M3 - Conference article
AN - SCOPUS:85064715894
VL - 1
SP - I461-I464
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
SN - 0736-7791
M1 - 389256
T2 - Proceedings of the 1994 IEEE International Conference on Acoustics, Speech and Signal Processing. Part 2 (of 6)
Y2 - 19 April 1994 through 22 April 1994
ER -