Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model

Sadao Hiroya, Masaaki Honda

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

Original languageEnglish
Pages (from-to)1071-1078
Number of pages8
JournalIEICE Transactions on Information and Systems
VolumeE87-D
Issue number5
Publication statusPublished - 2004 May

Fingerprint

Hidden Markov models
Acoustics

Keywords

  • Articulatory-to- acoustic mapping
  • HMM-based speech production model
  • Speaker adaptation
  • Speech inversion

ASJC Scopus subject areas

  • Information Systems
  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model. / Hiroya, Sadao; Honda, Masaaki.

In: IEICE Transactions on Information and Systems, Vol. E87-D, No. 5, 05.2004, p. 1071-1078.

Research output: Contribution to journalArticle

@article{4c3c0b8dca6f458fab88582e59c80fa6,
title = "Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model",
abstract = "We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 {\%}.",
keywords = "Articulatory-to- acoustic mapping, HMM-based speech production model, Speaker adaptation, Speech inversion",
author = "Sadao Hiroya and Masaaki Honda",
year = "2004",
month = "5",
language = "English",
volume = "E87-D",
pages = "1071--1078",
journal = "IEICE Transactions on Information and Systems",
issn = "0916-8532",
publisher = "Maruzen Co., Ltd/Maruzen Kabushikikaisha",
number = "5",

}

TY - JOUR

T1 - Speaker adaptation method for acoustic-to-articulatory inversion using an HMM-based speech production model

AU - Hiroya, Sadao

AU - Honda, Masaaki

PY - 2004/5

Y1 - 2004/5

N2 - We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

AB - We present a speaker adaptation method that makes it possible to determine articulatory parameters from an unknown speaker's speech spectrum using an HMM (Hidden Markov Model)-based speech production model. The model consists of HMMs of articulatory parameters for each phoneme and an articulatory-to-acoustic mapping that transforms the articulatory parameters into a speech spectrum for each HMM state. The model is statistically constructed by using actual articulatory-acoustic data. In the adaptation method, geometrical differences in the vocal tract as well as the articulatory behavior in the reference model are statistically adjusted to an unknown speaker. First, the articulatory parameters are estimated from an unknown speaker's speech spectrum using the reference model. Secondly, the articulatory-to-acoustic mapping is adjusted by maximizing the output probability of the acoustic parameters for the estimated articulatory parameters of the unknown speaker. With the adaptation method, the RMS error between the estimated articulatory parameters and the observed ones is 1.65 mm. The improvement rate over the speaker independent model is 56.1 %.

KW - Articulatory-to- acoustic mapping

KW - HMM-based speech production model

KW - Speaker adaptation

KW - Speech inversion

UR - http://www.scopus.com/inward/record.url?scp=2642528734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2642528734&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:2642528734

VL - E87-D

SP - 1071

EP - 1078

JO - IEICE Transactions on Information and Systems

JF - IEICE Transactions on Information and Systems

SN - 0916-8532

IS - 5

ER -