Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks

Naoto Iwahashi, Yoshinori Sagisaka

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

This paper describes a speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks. The interpolation is carried out using spectral parameters between pre-stored multiple speakers' utterance data to generate new spectrum patterns. Adaptation to a target speaker can be performed by this interpolation, which uses only a small amount of training data to generate new speech spectrum sequences close to those of the target speaker. Moreover, to obtain more precise adaptation by using a larger amount of training data, the transformation is represented by multiple interpolating functions. The multiple functions' outputs are weighted-summed, using weighting values given by RBF networks. The parameters of this multi-functional transformation are adapted by the gradient descent method. Adaptation experiments were carried out using four pre-stored speakers' data. Using only one word spoken by the target speaker for training, the distance between the target speaker's spectrum and the spectrum generated by the single interpolating function was reduced by about 35% compared with the distance between the target speaker's spectrum and the spectrum of the pre-stored speaker closest to the target. Using ten training words, the reduction rate increased to 48% by the multi-functional transformation.

Original languageEnglish
Pages (from-to)139-151
Number of pages13
JournalSpeech Communication
Volume16
Issue number2
DOIs
Publication statusPublished - 1995
Externally publishedYes

Fingerprint

Radial basis function networks
Radial Basis Function Network
weighting
Weighting
Interpolation
Interpolate
Target
RBF Network
Gradient Descent Method
Speech
experiment
Experiments
Values
Training
Output
Experiment

Keywords

  • Multiple functional representation
  • Radial basis function
  • Speaker adaptation
  • Speaker interpolation
  • Speech spectrum conversion
  • Voice conversion

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Software
  • Modelling and Simulation
  • Linguistics and Language
  • Communication
  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology

Cite this

@article{a63f9f81b1a4402aae5a4d3a1ff3d75d,
title = "Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks",
abstract = "This paper describes a speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks. The interpolation is carried out using spectral parameters between pre-stored multiple speakers' utterance data to generate new spectrum patterns. Adaptation to a target speaker can be performed by this interpolation, which uses only a small amount of training data to generate new speech spectrum sequences close to those of the target speaker. Moreover, to obtain more precise adaptation by using a larger amount of training data, the transformation is represented by multiple interpolating functions. The multiple functions' outputs are weighted-summed, using weighting values given by RBF networks. The parameters of this multi-functional transformation are adapted by the gradient descent method. Adaptation experiments were carried out using four pre-stored speakers' data. Using only one word spoken by the target speaker for training, the distance between the target speaker's spectrum and the spectrum generated by the single interpolating function was reduced by about 35{\%} compared with the distance between the target speaker's spectrum and the spectrum of the pre-stored speaker closest to the target. Using ten training words, the reduction rate increased to 48{\%} by the multi-functional transformation.",
keywords = "Multiple functional representation, Radial basis function, Speaker adaptation, Speaker interpolation, Speech spectrum conversion, Voice conversion",
author = "Naoto Iwahashi and Yoshinori Sagisaka",
year = "1995",
doi = "10.1016/0167-6393(94)00051-B",
language = "English",
volume = "16",
pages = "139--151",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Speech spectrum conversion based on speaker interpolation and multi-functional representation with weighting by radial basis function networks

AU - Iwahashi, Naoto

AU - Sagisaka, Yoshinori

PY - 1995

Y1 - 1995

N2 - This paper describes a speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks. The interpolation is carried out using spectral parameters between pre-stored multiple speakers' utterance data to generate new spectrum patterns. Adaptation to a target speaker can be performed by this interpolation, which uses only a small amount of training data to generate new speech spectrum sequences close to those of the target speaker. Moreover, to obtain more precise adaptation by using a larger amount of training data, the transformation is represented by multiple interpolating functions. The multiple functions' outputs are weighted-summed, using weighting values given by RBF networks. The parameters of this multi-functional transformation are adapted by the gradient descent method. Adaptation experiments were carried out using four pre-stored speakers' data. Using only one word spoken by the target speaker for training, the distance between the target speaker's spectrum and the spectrum generated by the single interpolating function was reduced by about 35% compared with the distance between the target speaker's spectrum and the spectrum of the pre-stored speaker closest to the target. Using ten training words, the reduction rate increased to 48% by the multi-functional transformation.

AB - This paper describes a speech spectrum transformation method by interpolating multi-speakers' spectral patterns and multi-functional representation with Radial Basis Function networks. The interpolation is carried out using spectral parameters between pre-stored multiple speakers' utterance data to generate new spectrum patterns. Adaptation to a target speaker can be performed by this interpolation, which uses only a small amount of training data to generate new speech spectrum sequences close to those of the target speaker. Moreover, to obtain more precise adaptation by using a larger amount of training data, the transformation is represented by multiple interpolating functions. The multiple functions' outputs are weighted-summed, using weighting values given by RBF networks. The parameters of this multi-functional transformation are adapted by the gradient descent method. Adaptation experiments were carried out using four pre-stored speakers' data. Using only one word spoken by the target speaker for training, the distance between the target speaker's spectrum and the spectrum generated by the single interpolating function was reduced by about 35% compared with the distance between the target speaker's spectrum and the spectrum of the pre-stored speaker closest to the target. Using ten training words, the reduction rate increased to 48% by the multi-functional transformation.

KW - Multiple functional representation

KW - Radial basis function

KW - Speaker adaptation

KW - Speaker interpolation

KW - Speech spectrum conversion

KW - Voice conversion

UR - http://www.scopus.com/inward/record.url?scp=0029251946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029251946&partnerID=8YFLogxK

U2 - 10.1016/0167-6393(94)00051-B

DO - 10.1016/0167-6393(94)00051-B

M3 - Article

VL - 16

SP - 139

EP - 151

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 2

ER -