Sequence summarizing neural network for speaker adaptation

Karel Vesely, Shinji Watanabe, Katerina Zmolikova, Martin Karafiat, Lukas Burget, Jan Honza Cernocky

Research output: Chapter in Book/Report/Conference proceedingConference contribution

53 Citations (Scopus)

Abstract

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a «summary vector», representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame-classification training. Moreover, appending both the i-vector and «summary vector» to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5315-5319
Number of pages5
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 2016 May 18
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 2016 Mar 202016 Mar 25

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2016-May
ISSN (Print)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Country/TerritoryChina
CityShanghai
Period16/3/2016/3/25

Keywords

  • DNN
  • SSNN
  • adaptation
  • i-vector
  • sequence summary

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Sequence summarizing neural network for speaker adaptation'. Together they form a unique fingerprint.

Cite this