TY - GEN
T1 - Sequence summarizing neural network for speaker adaptation
AU - Vesely, Karel
AU - Watanabe, Shinji
AU - Zmolikova, Katerina
AU - Karafiat, Martin
AU - Burget, Lukas
AU - Cernocky, Jan Honza
N1 - Funding Information:
The work reported here was carried out during the 2015 Jelinek Memorial Summer Workshop on Speech and Language Technologies at the University of Washington, Seattle, and was supported by Johns Hopkins University via NSF Grant No IIS 1005411, and gifts from Google, Microsoft Research, Amazon, Mitsubishi Electric, and MERL. BUT authors were also supported by Technology Agency of the Czech Republic project No. TA04011311 MINT and Czech MoI project No. VI20152020025 DRAPAK
Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.
AB - In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.
KW - DNN
KW - SSNN
KW - adaptation
KW - i-vector
KW - sequence summary
UR - http://www.scopus.com/inward/record.url?scp=84973294668&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973294668&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7472692
DO - 10.1109/ICASSP.2016.7472692
M3 - Conference contribution
AN - SCOPUS:84973294668
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5315
EP - 5319
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -