TY - JOUR
T1 - Spectrum conversion using prosodic information
AU - Mochizuki, Ryo
AU - Okubo, Tadashi
AU - Kobayashi, Tetsunori
PY - 2007/9/1
Y1 - 2007/9/1
N2 - For speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information, However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well.
AB - For speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information, However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well.
KW - Cepstrum
KW - GMM
KW - Prosodic information
KW - Speaker conversion
KW - Voice synthesis
UR - http://www.scopus.com/inward/record.url?scp=34547311133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547311133&partnerID=8YFLogxK
U2 - 10.1002/scj.20667
DO - 10.1002/scj.20667
M3 - Article
AN - SCOPUS:34547311133
VL - 38
SP - 12
EP - 20
JO - Systems and Computers in Japan
JF - Systems and Computers in Japan
SN - 0882-1666
IS - 10
ER -