Black box optimization for automatic speech recognition

Shinji Watanabe, Jonathan Le Roux

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

State-of-the-art automatic speech recognition (ASR) systems are very complex, combining multiple techniques and involving many types of tuning parameters (e.g., numbers of states and Gaussians in HMMs, numbers of neurons/layers and learning rates in neural networks, etc.). To reach optimal performance in such systems, deep understanding and expertise of each component is necessary, thus limiting the development of ASR systems to skilled experts. To overcome the problem, this paper studies the use of black box optimization, which automatically tunes systems without any prior knowledge. We consider an ASR system as a function with tuning parameters as input and speech recognition performance (e.g., word accuracy) as output, and we investigate two probabilistic black box optimization techniques: Covariance Mean Adaptation Evolution Strategy (CMA-ES) and Bayesian optimization using Gaussian process. Middle-vocabulary speech recognition experiments show the effectiveness of black box optimization, as performance approaching that of fine-tuned systems obtained by experts and/or outperforming that of sub-optimal systems can be automatically obtained.

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3256-3260
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence
Duration: 2014 May 42014 May 9

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CityFlorence
Period14/5/414/5/9

Fingerprint

Speech recognition
Tuning
Optimal systems
Neurons
Neural networks
Experiments

Keywords

  • Bayesian optimization
  • Black box optimization
  • CMA-ES
  • Gaussian process
  • Speech recognition

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Watanabe, S., & Le Roux, J. (2014). Black box optimization for automatic speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 (pp. 3256-3260). [6854202] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854202

Black box optimization for automatic speech recognition. / Watanabe, Shinji; Le Roux, Jonathan.

2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 3256-3260 6854202.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Watanabe, S & Le Roux, J 2014, Black box optimization for automatic speech recognition. in 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014., 6854202, Institute of Electrical and Electronics Engineers Inc., pp. 3256-3260, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, 14/5/4. https://doi.org/10.1109/ICASSP.2014.6854202
Watanabe S, Le Roux J. Black box optimization for automatic speech recognition. In 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 3256-3260. 6854202 https://doi.org/10.1109/ICASSP.2014.6854202
Watanabe, Shinji ; Le Roux, Jonathan. / Black box optimization for automatic speech recognition. 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 3256-3260
@inproceedings{2808c0316813455f9e55d94bf54088ae,
title = "Black box optimization for automatic speech recognition",
abstract = "State-of-the-art automatic speech recognition (ASR) systems are very complex, combining multiple techniques and involving many types of tuning parameters (e.g., numbers of states and Gaussians in HMMs, numbers of neurons/layers and learning rates in neural networks, etc.). To reach optimal performance in such systems, deep understanding and expertise of each component is necessary, thus limiting the development of ASR systems to skilled experts. To overcome the problem, this paper studies the use of black box optimization, which automatically tunes systems without any prior knowledge. We consider an ASR system as a function with tuning parameters as input and speech recognition performance (e.g., word accuracy) as output, and we investigate two probabilistic black box optimization techniques: Covariance Mean Adaptation Evolution Strategy (CMA-ES) and Bayesian optimization using Gaussian process. Middle-vocabulary speech recognition experiments show the effectiveness of black box optimization, as performance approaching that of fine-tuned systems obtained by experts and/or outperforming that of sub-optimal systems can be automatically obtained.",
keywords = "Bayesian optimization, Black box optimization, CMA-ES, Gaussian process, Speech recognition",
author = "Shinji Watanabe and {Le Roux}, Jonathan",
year = "2014",
doi = "10.1109/ICASSP.2014.6854202",
language = "English",
isbn = "9781479928927",
pages = "3256--3260",
booktitle = "2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Black box optimization for automatic speech recognition

AU - Watanabe, Shinji

AU - Le Roux, Jonathan

PY - 2014

Y1 - 2014

N2 - State-of-the-art automatic speech recognition (ASR) systems are very complex, combining multiple techniques and involving many types of tuning parameters (e.g., numbers of states and Gaussians in HMMs, numbers of neurons/layers and learning rates in neural networks, etc.). To reach optimal performance in such systems, deep understanding and expertise of each component is necessary, thus limiting the development of ASR systems to skilled experts. To overcome the problem, this paper studies the use of black box optimization, which automatically tunes systems without any prior knowledge. We consider an ASR system as a function with tuning parameters as input and speech recognition performance (e.g., word accuracy) as output, and we investigate two probabilistic black box optimization techniques: Covariance Mean Adaptation Evolution Strategy (CMA-ES) and Bayesian optimization using Gaussian process. Middle-vocabulary speech recognition experiments show the effectiveness of black box optimization, as performance approaching that of fine-tuned systems obtained by experts and/or outperforming that of sub-optimal systems can be automatically obtained.

AB - State-of-the-art automatic speech recognition (ASR) systems are very complex, combining multiple techniques and involving many types of tuning parameters (e.g., numbers of states and Gaussians in HMMs, numbers of neurons/layers and learning rates in neural networks, etc.). To reach optimal performance in such systems, deep understanding and expertise of each component is necessary, thus limiting the development of ASR systems to skilled experts. To overcome the problem, this paper studies the use of black box optimization, which automatically tunes systems without any prior knowledge. We consider an ASR system as a function with tuning parameters as input and speech recognition performance (e.g., word accuracy) as output, and we investigate two probabilistic black box optimization techniques: Covariance Mean Adaptation Evolution Strategy (CMA-ES) and Bayesian optimization using Gaussian process. Middle-vocabulary speech recognition experiments show the effectiveness of black box optimization, as performance approaching that of fine-tuned systems obtained by experts and/or outperforming that of sub-optimal systems can be automatically obtained.

KW - Bayesian optimization

KW - Black box optimization

KW - CMA-ES

KW - Gaussian process

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=84905216973&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905216973&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854202

DO - 10.1109/ICASSP.2014.6854202

M3 - Conference contribution

AN - SCOPUS:84905216973

SN - 9781479928927

SP - 3256

EP - 3260

BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -