Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy

Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, Kevin Duh

研究成果: Conference contribution

18 被引用数 (Scopus)

抄録

When building a state-of-the-art speech recognition system, the laborious effort required by human experts in tuning numerous parameters remains a prominent obstacle. The goal of this paper is to automate the process. We propose to tune DNN-HMM based large vocabulary speech recognition systems using the covariance matrix adaptation evolution strategy (CMA-ES) with a multi-objective Pareto optimization. This optimizes systems to achieve both high-accuracy and compact model size. An additional advantage of our approach is that it is efficiently parallelizable and easily adapted to cloud computing services. We performed experiments on the Corpus of Spontaneous Japanese (CSJ) using the TSUBAME 2.5 supercomputer. Compared with a strong manually tuned configuration borrowed from a similar system, our approach automatically discovered systems with lower WER by 0.48%, and systems with 59% smaller model size while keeping WER constant. The optimized training script is released in the Kaldi speech recognition toolkit as the first publicly available recipe for Japanese large vocabulary speech recognition.

本文言語English
ホスト出版物のタイトル2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ610-616
ページ数7
ISBN(電子版)9781479972913
DOI
出版ステータスPublished - 2016 2 10
外部発表はい
イベントIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
継続期間: 2015 12 132015 12 17

出版物シリーズ

名前2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
国/地域United States
CityScottsdale
Period15/12/1315/12/17

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル