High accurate model-integration-based voice conversion using dynamic features and model structure optimization

Daisuke Saito, Shinji Watanabe, Atsushi Nakamura, Nobuaki Minematsu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

This paper combines a parameter generation algorithm and a model optimization approach with the model-integration-based voice conversion (MIVC). We have proposed probabilistic integration of a joint density model and a speaker model to mitigate a requirement of the parallel corpus in voice conversion (VC) based on Gaussian Mixture Model (GMM). As well as the other VC methods, MIVC also suffers from the problems; the degradation of the perceptual quality caused by the discontinuity through the parameter trajectory, and the difficulty to optimize the model structure. To solve the problems, this paper proposes a parameter generation algorithm constrained by dynamic features for the first problem and an information criterion including mutual influences between the joint density model and the speaker model for the second problem. Experimental results show that the first approach improved the performance of VC and the second approach appropriately predicted the optimal number of mixtures of the speaker model for our MIVC.

Original languageEnglish
Title of host publication2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
Pages4576-4579
Number of pages4
DOIs
Publication statusPublished - 2011 Aug 18
Externally publishedYes
Event36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Prague, Czech Republic
Duration: 2011 May 222011 May 27

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
CountryCzech Republic
CityPrague
Period11/5/2211/5/27

Keywords

  • Voice conversion
  • dynamic features
  • information criterion
  • probabilistic integration

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'High accurate model-integration-based voice conversion using dynamic features and model structure optimization'. Together they form a unique fingerprint.

  • Cite this

    Saito, D., Watanabe, S., Nakamura, A., & Minematsu, N. (2011). High accurate model-integration-based voice conversion using dynamic features and model structure optimization. In 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings (pp. 4576-4579). [5947373] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). https://doi.org/10.1109/ICASSP.2011.5947373