Spectrum conversion using prosodic information

Ryo Mochizuki*, Tadashi Okubo, Tetsunori Kobayashi


研究成果: Article査読


For speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information, However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well.

ジャーナルSystems and Computers in Japan
出版ステータスPublished - 2007 9月 1

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • 情報システム
  • ハードウェアとアーキテクチャ
  • 計算理論と計算数学


「Spectrum conversion using prosodic information」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。