Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody

Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Mariko Kondo, Junichi Yamagishi

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

We describe a new application of deep-learning-based speech synthesis, namely multilingual speech synthesis for generating controllable foreign accent. Specifically, we train a DBLSTM-based acoustic model on non-accented multilingual speech recordings from a speaker native in several languages. By copying durations and pitch contours from a pre-recorded utterance of the desired prompt, natural prosody is achieved. We call this paradigm 'cyborg speech' as it combines human and machine speech parameters. Segmentally accented speech is produced by interpolating specific quinphone linguistic features towards phones from the other language that represent non-native mispronunciations. Experiments on synthetic American-English-accented Japanese speech show that subjective synthesis quality matches monolingual synthesis, that natural pitch is maintained, and that naturalistic phone substitutions generate output that is perceived as having an American foreign accent, even though only non-accented training data was used.

本文言語English
ホスト出版物のタイトル2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ4799-4803
ページ数5
ISBN(印刷版)9781538646588
DOI
出版ステータスPublished - 2018 9 10
イベント2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
継続期間: 2018 4 152018 4 20

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2018-April
ISSN(印刷版)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period18/4/1518/4/20

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

フィンガープリント 「Cyborg speech: Deep multilingual speech synthesis for generating segmental foreign accent with natural prosody」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル