Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition

Takafumi Moriya, Tomohiro Tanaka, Takahiro Shinozaki, Shinji Watanabe, Kevin Duh

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model (HMM) and deep neural network (DNN). To realize the highest recognition performance, numerous metaparameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate (WER) and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.

Original languageEnglish
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
DOIs
Publication statusAccepted/In press - 2018 Jan 1
Externally publishedYes

Fingerprint

Evolution Strategies
speech recognition
Speech Recognition
System Development
automation
Speech recognition
Automation
High Performance
education
Hidden Markov models
Pareto Optimization
Tuning
tuning
Parallel Computation
optimization
Multi-objective Optimization
evaluation
Markov Model
Error Rate
configurations

Keywords

  • CMA-ES
  • Deep neural network
  • Genetic algorithm
  • Genetic algorithms
  • Hidden Markov models
  • Multi-objective optimization
  • Optimization
  • Speech processing
  • Speech recognition
  • Speech recognition
  • Training
  • Vocabulary

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Cite this

Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition. / Moriya, Takafumi; Tanaka, Tomohiro; Shinozaki, Takahiro; Watanabe, Shinji; Duh, Kevin.

In: IEEE/ACM Transactions on Audio Speech and Language Processing, 01.01.2018.

Research output: Contribution to journalArticle

@article{65c99a21cd35427c85c7abd5e33ea2d7,
title = "Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition",
abstract = "The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model (HMM) and deep neural network (DNN). To realize the highest recognition performance, numerous metaparameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate (WER) and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.",
keywords = "CMA-ES, Deep neural network, Genetic algorithm, Genetic algorithms, Hidden Markov models, Multi-objective optimization, Optimization, Speech processing, Speech recognition, Speech recognition, Training, Vocabulary",
author = "Takafumi Moriya and Tomohiro Tanaka and Takahiro Shinozaki and Shinji Watanabe and Kevin Duh",
year = "2018",
month = "1",
day = "1",
doi = "10.1109/TASLP.2018.2871755",
language = "English",
journal = "IEEE/ACM Transactions on Speech and Language Processing",
issn = "2329-9290",
publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition

AU - Moriya, Takafumi

AU - Tanaka, Tomohiro

AU - Shinozaki, Takahiro

AU - Watanabe, Shinji

AU - Duh, Kevin

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model (HMM) and deep neural network (DNN). To realize the highest recognition performance, numerous metaparameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate (WER) and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.

AB - The state-of-the-art large vocabulary speech recognition systems consist of several components including hidden Markov model (HMM) and deep neural network (DNN). To realize the highest recognition performance, numerous metaparameters specifying the designs and training setups of these components must be optimized. A prominent obstacle in system development is the laborious effort required by human experts in tuning these meta-parameters. To automate the process, we propose to tune the meta-parameters of a whole large vocabulary speech recognition system using the evolution strategy with a multi-objective Pareto optimization. As the result of the evolution, the system is optimized for both low word error rate (WER) and compact model size. Since the approach requires repeated training and evaluation of the recognition systems that require large computation, we make use of parallel computation on cloud computers. Experimental results show the effectiveness of the proposed approach by discovering appropriate configuration for large vocabulary speech recognition systems automatically.

KW - CMA-ES

KW - Deep neural network

KW - Genetic algorithm

KW - Genetic algorithms

KW - Hidden Markov models

KW - Multi-objective optimization

KW - Optimization

KW - Speech processing

KW - Speech recognition

KW - Speech recognition

KW - Training

KW - Vocabulary

UR - http://www.scopus.com/inward/record.url?scp=85054398695&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054398695&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2018.2871755

DO - 10.1109/TASLP.2018.2871755

M3 - Article

AN - SCOPUS:85054398695

JO - IEEE/ACM Transactions on Speech and Language Processing

JF - IEEE/ACM Transactions on Speech and Language Processing

SN - 2329-9290

ER -