Probabilistic integration of joint density model and speaker model for voice conversion

Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Pages1728-1731
Number of pages4
Publication statusPublished - 2010
Externally publishedYes
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba
Duration: 2010 Sep 262010 Sep 30

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CityMakuhari, Chiba
Period10/9/2610/9/30

Fingerprint

Joints
Statistical Models
Linguistics
Mixture Model
Utterance
Parallel Corpora

Keywords

  • Joint density model
  • Probabilistic unification
  • Speaker model
  • Voice conversion

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Saito, D., Watanabe, S., Nakamura, A., & Minematsu, N. (2010). Probabilistic integration of joint density model and speaker model for voice conversion. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 1728-1731)

Probabilistic integration of joint density model and speaker model for voice conversion. / Saito, Daisuke; Watanabe, Shinji; Nakamura, Atsushi; Minematsu, Nobuaki.

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 1728-1731.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Saito, D, Watanabe, S, Nakamura, A & Minematsu, N 2010, Probabilistic integration of joint density model and speaker model for voice conversion. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. pp. 1728-1731, 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, 10/9/26.
Saito D, Watanabe S, Nakamura A, Minematsu N. Probabilistic integration of joint density model and speaker model for voice conversion. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 1728-1731
Saito, Daisuke ; Watanabe, Shinji ; Nakamura, Atsushi ; Minematsu, Nobuaki. / Probabilistic integration of joint density model and speaker model for voice conversion. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. pp. 1728-1731
@inproceedings{b24bed51a78546a49883e9bc5cdfba6e,
title = "Probabilistic integration of joint density model and speaker model for voice conversion",
abstract = "This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.",
keywords = "Joint density model, Probabilistic unification, Speaker model, Voice conversion",
author = "Daisuke Saito and Shinji Watanabe and Atsushi Nakamura and Nobuaki Minematsu",
year = "2010",
language = "English",
pages = "1728--1731",
booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

TY - GEN

T1 - Probabilistic integration of joint density model and speaker model for voice conversion

AU - Saito, Daisuke

AU - Watanabe, Shinji

AU - Nakamura, Atsushi

AU - Minematsu, Nobuaki

PY - 2010

Y1 - 2010

N2 - This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.

AB - This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.

KW - Joint density model

KW - Probabilistic unification

KW - Speaker model

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=79959834571&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959834571&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:79959834571

SP - 1728

EP - 1731

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ER -