Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN

Hiroki Kanagawa, Yuuki Tachioka, Shinji Watanabe, Jun Ishii

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.

Original languageEnglish
Title of host publication2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages86-92
Number of pages7
ISBN (Electronic)9789881476807
DOIs
Publication statusPublished - 2016 Feb 19
Externally publishedYes
Event2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 - Hong Kong, Hong Kong
Duration: 2015 Dec 162015 Dec 19

Other

Other2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
CountryHong Kong
CityHong Kong
Period15/12/1615/12/19

Fingerprint

Regression Tree
Transformation Matrix
Feature Space
Neural Networks
Linear regression
Maximum likelihood
Maximum Likelihood
Decoding
Maximum a Posteriori
Gaussian Mixture Model
MAP Estimation
Generative Models
Weighted Sums
Prior distribution
Deep neural networks
Multiplication
Acoustics
Alignment
Tend
Transform

ASJC Scopus subject areas

  • Artificial Intelligence
  • Modelling and Simulation
  • Signal Processing

Cite this

Kanagawa, H., Tachioka, Y., Watanabe, S., & Ishii, J. (2016). Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015 (pp. 86-92). [7415425] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/APSIPA.2015.7415425

Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. / Kanagawa, Hiroki; Tachioka, Yuuki; Watanabe, Shinji; Ishii, Jun.

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 86-92 7415425.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kanagawa, H, Tachioka, Y, Watanabe, S & Ishii, J 2016, Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015., 7415425, Institute of Electrical and Electronics Engineers Inc., pp. 86-92, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015, Hong Kong, Hong Kong, 15/12/16. https://doi.org/10.1109/APSIPA.2015.7415425
Kanagawa H, Tachioka Y, Watanabe S, Ishii J. Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc. 2016. p. 86-92. 7415425 https://doi.org/10.1109/APSIPA.2015.7415425
Kanagawa, Hiroki ; Tachioka, Yuuki ; Watanabe, Shinji ; Ishii, Jun. / Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN. 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 86-92
@inproceedings{7cc371c3b0504061afe54868b236b146,
title = "Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN",
abstract = "Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.",
author = "Hiroki Kanagawa and Yuuki Tachioka and Shinji Watanabe and Jun Ishii",
year = "2016",
month = "2",
day = "19",
doi = "10.1109/APSIPA.2015.7415425",
language = "English",
pages = "86--92",
booktitle = "2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN

AU - Kanagawa, Hiroki

AU - Tachioka, Yuuki

AU - Watanabe, Shinji

AU - Ishii, Jun

PY - 2016/2/19

Y1 - 2016/2/19

N2 - Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.

AB - Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.

UR - http://www.scopus.com/inward/record.url?scp=84986224268&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986224268&partnerID=8YFLogxK

U2 - 10.1109/APSIPA.2015.7415425

DO - 10.1109/APSIPA.2015.7415425

M3 - Conference contribution

AN - SCOPUS:84986224268

SP - 86

EP - 92

BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -