Spectrum conversion using prosodic information

Ryo Mochizuki, Tadashi Okubo, Tetsunori Kobayashi

研究成果: Article査読

抄録

For speaker conversion with spectral conversion using GMM, a method is proposed for adding information relating to prosody to the characteristic values and improving conversion precision. In conventional spectral conversion using GMM, only the unaltered spectral parameters are used as input information, However, the voice spectrum is generally related to the closeness of the base frequencies during speech, and therefore, improvement in the quality of the converted voice can be expected with the consideration of prosodic information at the time of conversion. Thus, a method is proposed for spectrum conversion with good precision which assumes the application to actual synthesis by rule, and performs GMM training using the prosodic information of the conversion source and conversion target. Also, the proposed spectrum conversion is applied to speech conversion in a voice synthesis framework. At this time, a method is proposed for preparing triphone joint vectors to ensure training data of a greater number of prosodic conditions using a parallel corpus. A physical evaluation using the cepstrum distance indicates that the use of prosodic information is effective in improving the precision of spectrum conversion. An auditory evaluation was performed of voice quality and speech characteristics after conversion with a conventional method and the proposed method, and indicated that the proposed method is effective in an auditory sense as well.

本文言語English
ページ(範囲)12-20
ページ数9
ジャーナルSystems and Computers in Japan
38
10
DOI
出版ステータスPublished - 2007 9 1

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Information Systems
  • Hardware and Architecture
  • Computational Theory and Mathematics

フィンガープリント 「Spectrum conversion using prosodic information」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル