TY - GEN
T1 - Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN
AU - Kanagawa, Hiroki
AU - Tachioka, Yuuki
AU - Watanabe, Shinji
AU - Ishii, Jun
N1 - Publisher Copyright:
© 2015 Asia-Pacific Signal and Information Processing Association.
PY - 2016/2/19
Y1 - 2016/2/19
N2 - Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.
AB - Feature-space maximum-likelihood linear regression (fMLLR) transforms acoustic features to adapted ones by a multiplication operation with a single transformation matrix. This property realizes an efficient adaptation performed within a pre-precessing, which is independent of a decoding process, and this type of adaptation can be applied to deep neural network (DNN). On the other hand, constrained MLLR (CMLLR) uses multiple transformation matrices based on a regression tree, which provides further improvement from fMLLR. However, there are two problems in the model-space adaptations: first, these types of adaptation cannot be applied to DNN because adaptation and decoding must share the same generative model, i.e. Gaussian mixture model (GMM). Second, transformation matrices tend to be overly fit when the amount of adaptation data is small. This paper proposes to use multiple transformation matrices within a feature-space adaptation framework. The proposed method first estimates multiple transformation matrices in the GMM framework according to the first-pass decoding results and the alignments, and then takes a weighted sum of these matrices to obtain a single feature transformation matrix frame-by-frame. In addition, to address the second problem, we propose feature-space structural maximum a posteriori linear regression (fSMAPLR), which introduces hierarchal prior distributions to regularize the MAP estimation. Experimental results show that the proposed fSMAPLR outperformed fMLLR.
UR - http://www.scopus.com/inward/record.url?scp=84986224268&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986224268&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2015.7415425
DO - 10.1109/APSIPA.2015.7415425
M3 - Conference contribution
AN - SCOPUS:84986224268
T3 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
SP - 86
EP - 92
BT - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2015
Y2 - 16 December 2015 through 19 December 2015
ER -