Minimum word error training of long short-term memory recurrent neural network language models for speech recognition

Takaaki Hori, Chiori Hori, Shinji Watanabe, John R. Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).

Original languageEnglish
Title of host publication2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5990-5994
Number of pages5
Volume2016-May
ISBN (Electronic)9781479999880
DOIs
Publication statusPublished - 2016 May 18
Externally publishedYes
Event41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
Duration: 2016 Mar 202016 Mar 25

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
CountryChina
CityShanghai
Period16/3/2016/3/25

Fingerprint

Recurrent neural networks
Speech recognition
Long short-term memory
Maximum likelihood
Entropy

Keywords

  • Long short-term memory
  • Minimum word error training
  • Recurrent neural network language model
  • Speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Hori, T., Hori, C., Watanabe, S., & Hershey, J. R. (2016). Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings (Vol. 2016-May, pp. 5990-5994). [7472827] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2016.7472827

Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. / Hori, Takaaki; Hori, Chiori; Watanabe, Shinji; Hershey, John R.

2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. p. 5990-5994 7472827.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hori, T, Hori, C, Watanabe, S & Hershey, JR 2016, Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. vol. 2016-May, 7472827, Institute of Electrical and Electronics Engineers Inc., pp. 5990-5994, 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, 16/3/20. https://doi.org/10.1109/ICASSP.2016.7472827
Hori T, Hori C, Watanabe S, Hershey JR. Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May. Institute of Electrical and Electronics Engineers Inc. 2016. p. 5990-5994. 7472827 https://doi.org/10.1109/ICASSP.2016.7472827
Hori, Takaaki ; Hori, Chiori ; Watanabe, Shinji ; Hershey, John R. / Minimum word error training of long short-term memory recurrent neural network language models for speech recognition. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings. Vol. 2016-May Institute of Electrical and Electronics Engineers Inc., 2016. pp. 5990-5994
@inproceedings{e3788215a524469986aab74d37ec79f5,
title = "Minimum word error training of long short-term memory recurrent neural network language models for speech recognition",
abstract = "This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).",
keywords = "Long short-term memory, Minimum word error training, Recurrent neural network language model, Speech recognition",
author = "Takaaki Hori and Chiori Hori and Shinji Watanabe and Hershey, {John R.}",
year = "2016",
month = "5",
day = "18",
doi = "10.1109/ICASSP.2016.7472827",
language = "English",
volume = "2016-May",
pages = "5990--5994",
booktitle = "2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Minimum word error training of long short-term memory recurrent neural network language models for speech recognition

AU - Hori, Takaaki

AU - Hori, Chiori

AU - Watanabe, Shinji

AU - Hershey, John R.

PY - 2016/5/18

Y1 - 2016/5/18

N2 - This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).

AB - This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).

KW - Long short-term memory

KW - Minimum word error training

KW - Recurrent neural network language model

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84973341477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973341477&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2016.7472827

DO - 10.1109/ICASSP.2016.7472827

M3 - Conference contribution

VL - 2016-May

SP - 5990

EP - 5994

BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -