Sequence summarizing neural network for speaker adaptation

Karel Vesely, Shinji Watanabe, Katerina Zmolikova, Martin Karafiat, Lukas Burget, Jan Honza Cernocky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

27 Citations (Scopus)

Abstract

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5315-5319
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 2016 May 18
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 2016 Mar 202016 Mar 25

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period16/3/2016/3/25

Fingerprint

Neural networks
Acoustics

Keywords

  • adaptation
  • DNN
  • i-vector
  • sequence summary
  • SSNN

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Vesely, K., Watanabe, S., Zmolikova, K., Karafiat, M., Burget, L., & Cernocky, J. H. (2016). Sequence summarizing neural network for speaker adaptation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 5315-5319). [7472692] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472692

Sequence summarizing neural network for speaker adaptation. / Vesely, Karel; Watanabe, Shinji; Zmolikova, Katerina; Karafiat, Martin; Burget, Lukas; Cernocky, Jan Honza.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 5315-5319 7472692.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vesely, K, Watanabe, S, Zmolikova, K, Karafiat, M, Burget, L & Cernocky, JH 2016, Sequence summarizing neural network for speaker adaptation. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7472692, Institute of Electrical and Electronics Engineers Inc., pp. 5315-5319, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 16/3/20. https://doi.org/10.1109/ICASSP.2016.7472692
Vesely K, Watanabe S, Zmolikova K, Karafiat M, Burget L, Cernocky JH. Sequence summarizing neural network for speaker adaptation. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 5315-5319. 7472692 https://doi.org/10.1109/ICASSP.2016.7472692
Vesely, Karel ; Watanabe, Shinji ; Zmolikova, Katerina ; Karafiat, Martin ; Burget, Lukas ; Cernocky, Jan Honza. / Sequence summarizing neural network for speaker adaptation. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 5315-5319
@inproceedings{98f82864b4fb4e319ddc0c6cedde3c54,
title = "Sequence summarizing neural network for speaker adaptation",
abstract = "In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.",
keywords = "adaptation, DNN, i-vector, sequence summary, SSNN",
author = "Karel Vesely and Shinji Watanabe and Katerina Zmolikova and Martin Karafiat and Lukas Burget and Cernocky, {Jan Honza}",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7472692",
language = "English",
volume = "2016-May",
pages = "5315--5319",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Sequence summarizing neural network for speaker adaptation

AU - Vesely, Karel

AU - Watanabe, Shinji

AU - Zmolikova, Katerina

AU - Karafiat, Martin

AU - Burget, Lukas

AU - Cernocky, Jan Honza

PY - 2016/5/18

Y1 - 2016/5/18

N2 - In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

AB - In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

KW - adaptation

KW - DNN

KW - i-vector

KW - sequence summary

KW - SSNN

UR - http://www.scopus.com/inward/record.url?scp=84973294668&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973294668&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472692

DO - 10.1109/ICASSP.2016.7472692

M3 - Conference contribution

VL - 2016-May

SP - 5315

EP - 5319

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -