Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics

Shinji Watanabe, Atsushi Nakamura

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we reformulate an adaptation scheme of Coarse/Fine Training (CFT) of transfer vectors in acoustic modeling by using directional statistics. In CFT, the transfer vector is decomposed into a unit direction vector and a scaling factor. By using coarse tied Gaussian class (coarse class) estimation for the unit direction vector, and by using fine tied Gaussian class (fine class) estimation for the scaling factor, we can obtain accurate transfer vectors with a small number of free parameters. Directional statistics is a method for analyzing geometric parameters (e.g. angle and unit vector) using directional data, and is suited for the analysis of the CFT representation. Using directional statistics as a basis, we construct expectation-maximization algorithms for CFT parameters an-alytically using the maximum likelihood and Bayesian (maximum a posteriori) approaches. In particular, with the Bayesian approach, prior and posterior distributions for unit direction vectors are represented with a von Mises distribution, a representative distribution in directional statistics. Speaker adaptation experiments show that our proposal improves the performance of large vocabulary continuous speech recognition due to the efficient coarse/fine representation of transfer vectors, compared with the conventional transfer vector adaptation.

Original languageEnglish
Title of host publication2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
Volume1
Publication statusPublished - 2006
Externally publishedYes
Event2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006 - Toulouse
Duration: 2006 May 142006 May 19

Other

Other2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006
CityToulouse
Period06/5/1406/5/19

Fingerprint

transfer of training
Acoustics
Statistics
statistics
acoustics
education
Continuous speech recognition
scaling
speech recognition
Maximum likelihood

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Watanabe, S., & Nakamura, A. (2006). Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics. In 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings (Vol. 1). [1660193]

Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics. / Watanabe, Shinji; Nakamura, Atsushi.

2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. Vol. 1 2006. 1660193.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Watanabe, S & Nakamura, A 2006, Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics. in 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. vol. 1, 1660193, 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, Toulouse, 06/5/14.
Watanabe S, Nakamura A. Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics. In 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. Vol. 1. 2006. 1660193
Watanabe, Shinji ; Nakamura, Atsushi. / Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics. 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings. Vol. 1 2006.
@inproceedings{b426a03598c9462b8e0ba8eded22a25d,
title = "Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics",
abstract = "In this paper, we reformulate an adaptation scheme of Coarse/Fine Training (CFT) of transfer vectors in acoustic modeling by using directional statistics. In CFT, the transfer vector is decomposed into a unit direction vector and a scaling factor. By using coarse tied Gaussian class (coarse class) estimation for the unit direction vector, and by using fine tied Gaussian class (fine class) estimation for the scaling factor, we can obtain accurate transfer vectors with a small number of free parameters. Directional statistics is a method for analyzing geometric parameters (e.g. angle and unit vector) using directional data, and is suited for the analysis of the CFT representation. Using directional statistics as a basis, we construct expectation-maximization algorithms for CFT parameters an-alytically using the maximum likelihood and Bayesian (maximum a posteriori) approaches. In particular, with the Bayesian approach, prior and posterior distributions for unit direction vectors are represented with a von Mises distribution, a representative distribution in directional statistics. Speaker adaptation experiments show that our proposal improves the performance of large vocabulary continuous speech recognition due to the efficient coarse/fine representation of transfer vectors, compared with the conventional transfer vector adaptation.",
author = "Shinji Watanabe and Atsushi Nakamura",
year = "2006",
language = "English",
isbn = "142440469X",
volume = "1",
booktitle = "2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings",

}

TY - GEN

T1 - Acoustic model adaptation based on coarse/fine training of transfer vectors using directional statistics

AU - Watanabe, Shinji

AU - Nakamura, Atsushi

PY - 2006

Y1 - 2006

N2 - In this paper, we reformulate an adaptation scheme of Coarse/Fine Training (CFT) of transfer vectors in acoustic modeling by using directional statistics. In CFT, the transfer vector is decomposed into a unit direction vector and a scaling factor. By using coarse tied Gaussian class (coarse class) estimation for the unit direction vector, and by using fine tied Gaussian class (fine class) estimation for the scaling factor, we can obtain accurate transfer vectors with a small number of free parameters. Directional statistics is a method for analyzing geometric parameters (e.g. angle and unit vector) using directional data, and is suited for the analysis of the CFT representation. Using directional statistics as a basis, we construct expectation-maximization algorithms for CFT parameters an-alytically using the maximum likelihood and Bayesian (maximum a posteriori) approaches. In particular, with the Bayesian approach, prior and posterior distributions for unit direction vectors are represented with a von Mises distribution, a representative distribution in directional statistics. Speaker adaptation experiments show that our proposal improves the performance of large vocabulary continuous speech recognition due to the efficient coarse/fine representation of transfer vectors, compared with the conventional transfer vector adaptation.

AB - In this paper, we reformulate an adaptation scheme of Coarse/Fine Training (CFT) of transfer vectors in acoustic modeling by using directional statistics. In CFT, the transfer vector is decomposed into a unit direction vector and a scaling factor. By using coarse tied Gaussian class (coarse class) estimation for the unit direction vector, and by using fine tied Gaussian class (fine class) estimation for the scaling factor, we can obtain accurate transfer vectors with a small number of free parameters. Directional statistics is a method for analyzing geometric parameters (e.g. angle and unit vector) using directional data, and is suited for the analysis of the CFT representation. Using directional statistics as a basis, we construct expectation-maximization algorithms for CFT parameters an-alytically using the maximum likelihood and Bayesian (maximum a posteriori) approaches. In particular, with the Bayesian approach, prior and posterior distributions for unit direction vectors are represented with a von Mises distribution, a representative distribution in directional statistics. Speaker adaptation experiments show that our proposal improves the performance of large vocabulary continuous speech recognition due to the efficient coarse/fine representation of transfer vectors, compared with the conventional transfer vector adaptation.

UR - http://www.scopus.com/inward/record.url?scp=33947638496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947638496&partnerID=8YFLogxK

M3 - Conference contribution

SN - 142440469X

SN - 9781424404698

VL - 1

BT - 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings

ER -