TY - JOUR
T1 - Evolution-strategy-based automation of system development for high-performance speech recognition
AU - Moriya, Takafumi
AU - Tanaka, Tomohiro
AU - Shinozaki, Takahiro
AU - Watanabe, Shinji
AU - Duh, Kevin
N1 - Funding Information:
Manuscript received March 30, 2018; revised July 23, 2018 and September 2, 2018; accepted September 5, 2018. Date of publication September 24, 2018; date of current version October 15, 2018. The work of T. Moriya, T. Tanaka, T. Shinozaki, and K. Duh was supported by JSPS KAKENHI under Grants 26280055 and 17K20001. The work of S. Watanabe was supported by MERL. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Jinyu Li. (Corresponding author: Takafumi Moriya.) T. Moriya, T. Tanaka, and T. Shinozaki are with the Tokyo Institute of Technology, Yokohama 226-8502, Japan (e-mail:, it.moriya.777@gmail.com; tm.gdmt.tchg.jt26@gmail.com; staka@u.washington.edu).
Publisher Copyright:
© 2014 IEEE.
PY - 2019/1
Y1 - 2019/1
N2 - The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model and deep neural network. To realize the highest recognition performance, numerous meta-parameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.
AB - The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model and deep neural network. To realize the highest recognition performance, numerous meta-parameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.
KW - Speech recognition
KW - covariance matrix adaptation evolution strategy (CMA-ES)
KW - deep neural network (DNN)
KW - genetic algorithm
KW - multi-objective optimization
UR - http://www.scopus.com/inward/record.url?scp=85054398695&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054398695&partnerID=8YFLogxK
U2 - 10.1109/TASLP.2018.2871755
DO - 10.1109/TASLP.2018.2871755
M3 - Article
AN - SCOPUS:85054398695
SN - 2329-9290
VL - 27
SP - 77
EP - 88
JO - IEEE/ACM Transactions on Speech and Language Processing
JF - IEEE/ACM Transactions on Speech and Language Processing
IS - 1
M1 - 8470178
ER -