Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy

Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, Kevin Duh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

Original languageEnglish
Title of host publication2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages610-616
Number of pages7
ISBN (Electronic)9781479972913
DOIs
Publication statusPublished - 2016 Feb 10
Externally publishedYes
EventIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
Duration: 2015 Dec 132015 Dec 17

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
CountryUnited States
CityScottsdale
Period15/12/1315/12/17

Fingerprint

Speech recognition
Automation
Supercomputers
Cloud computing
Covariance matrix
Tuning
Experiments

Keywords

  • deep neural network
  • evolution strategy
  • Japanese spontaneous speech recognition
  • large vocabulary speech recognition
  • multi-objective optimization

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition

Cite this

Moriya, T., Tanaka, T., Shinozaki, T., Watanabe, S., & Duh, K. (2016). Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings (pp. 610-616). [7404852] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASRU.2015.7404852

Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. / Moriya, Takafumi; Tanaka, Tomohiro; Shinozaki, Takahiro; Watanabe, Shinji; Duh, Kevin.

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. p. 610-616 7404852.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Moriya, T, Tanaka, T, Shinozaki, T, Watanabe, S & Duh, K 2016, Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings., 7404852, Institute of Electrical and Electronics Engineers Inc., pp. 610-616, IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015, Scottsdale, United States, 15/12/13. https://doi.org/10.1109/ASRU.2015.7404852
Moriya T, Tanaka T, Shinozaki T, Watanabe S, Duh K. Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. In 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2016. p. 610-616. 7404852 https://doi.org/10.1109/ASRU.2015.7404852
Moriya, Takafumi ; Tanaka, Tomohiro ; Shinozaki, Takahiro ; Watanabe, Shinji ; Duh, Kevin. / Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 610-616
@inproceedings{c1ad5c43c0bc40a281aef76d3fa5e326,
title = "Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy",
abstract = "When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48{\%}, and systems with 59{\%} smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.",
keywords = "deep neural network, evolution strategy, Japanese spontaneous speech recognition, large vocabulary speech recognition, multi-objective optimization",
author = "Takafumi Moriya and Tomohiro Tanaka and Takahiro Shinozaki and Shinji Watanabe and Kevin Duh",
year = "2016",
month = "2",
day = "10",
doi = "10.1109/ASRU.2015.7404852",
language = "English",
pages = "610--616",
booktitle = "2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy

AU - Moriya, Takafumi

AU - Tanaka, Tomohiro

AU - Shinozaki, Takahiro

AU - Watanabe, Shinji

AU - Duh, Kevin

PY - 2016/2/10

Y1 - 2016/2/10

N2 - When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

AB - When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

KW - deep neural network

KW - evolution strategy

KW - Japanese spontaneous speech recognition

KW - large vocabulary speech recognition

KW - multi-objective optimization

UR - http://www.scopus.com/inward/record.url?scp=84964555888&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964555888&partnerID=8YFLogxK

U2 - 10.1109/ASRU.2015.7404852

DO - 10.1109/ASRU.2015.7404852

M3 - Conference contribution

SP - 610

EP - 616

BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -