Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, Takaaki Hori, Kevin Duh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.

Original languageEnglish
Title of host publication2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages665-671
Number of pages7
ISBN (Electronic)9781509049035
DOIs
Publication statusPublished - 2017 Feb 7
Externally publishedYes
Event2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - San Diego, United States
Duration: 2016 Dec 132016 Dec 16

Other

Other2016 IEEE Workshop on Spoken Language Technology, SLT 2016
CountryUnited States
CitySan Diego
Period16/12/1316/12/16

Fingerprint

Covariance matrix
Tuning
Neural networks
Recurrent neural networks
Speech recognition
Evolutionary algorithms
Decoding
Neural Networks
Computational
Language Model
Long short-term memory
Evolutionary
Speech Recognition
Layer
Robustness
Recurrent Neural Networks
Short-term Memory

Keywords

  • Evolution strategy
  • Language model
  • Large vocabulary speech recognition
  • Long short-term memory
  • Multi-objective optimization

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Artificial Intelligence
  • Language and Linguistics
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Tanaka, T., Moriya, T., Shinozaki, T., Watanabe, S., Hori, T., & Duh, K. (2017). Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings (pp. 665-671). [7846334] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SLT.2016.7846334

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. / Tanaka, Tomohiro; Moriya, Takafumi; Shinozaki, Takahiro; Watanabe, Shinji; Hori, Takaaki; Duh, Kevin.

2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. p. 665-671 7846334.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tanaka, T, Moriya, T, Shinozaki, T, Watanabe, S, Hori, T & Duh, K 2017, Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. in 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings., 7846334, Institute of Electrical and Electronics Engineers Inc., pp. 665-671, 2016 IEEE Workshop on Spoken Language Technology, SLT 2016, San Diego, United States, 16/12/13. https://doi.org/10.1109/SLT.2016.7846334
Tanaka T, Moriya T, Shinozaki T, Watanabe S, Hori T, Duh K. Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. In 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. p. 665-671. 7846334 https://doi.org/10.1109/SLT.2016.7846334
Tanaka, Tomohiro ; Moriya, Takafumi ; Shinozaki, Takahiro ; Watanabe, Shinji ; Hori, Takaaki ; Duh, Kevin. / Automated structure discovery and parameter tuning of neural network language model based on evolution strategy. 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 665-671
@inproceedings{65cadbcef5bb4a1e9cfedc8e9ea4ba2c,
title = "Automated structure discovery and parameter tuning of neural network language model based on evolution strategy",
abstract = "Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1{\%} and 22.7{\%} respectively, compared to an initial baseline system whose WER was 8.7{\%}.",
keywords = "Evolution strategy, Language model, Large vocabulary speech recognition, Long short-term memory, Multi-objective optimization",
author = "Tomohiro Tanaka and Takafumi Moriya and Takahiro Shinozaki and Shinji Watanabe and Takaaki Hori and Kevin Duh",
year = "2017",
month = "2",
day = "7",
doi = "10.1109/SLT.2016.7846334",
language = "English",
pages = "665--671",
booktitle = "2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

AU - Tanaka, Tomohiro

AU - Moriya, Takafumi

AU - Shinozaki, Takahiro

AU - Watanabe, Shinji

AU - Hori, Takaaki

AU - Duh, Kevin

PY - 2017/2/7

Y1 - 2017/2/7

N2 - Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.

AB - Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.

KW - Evolution strategy

KW - Language model

KW - Large vocabulary speech recognition

KW - Long short-term memory

KW - Multi-objective optimization

UR - http://www.scopus.com/inward/record.url?scp=85015997753&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015997753&partnerID=8YFLogxK

U2 - 10.1109/SLT.2016.7846334

DO - 10.1109/SLT.2016.7846334

M3 - Conference contribution

SP - 665

EP - 671

BT - 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -