The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition

Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe

研究成果: Conference contribution

36 被引用数 (Scopus)

抄録

This paper introduces the MERL/SRI system designed for the 3rd CHiME speech separation and recognition challenge (CHiME-3). Our proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front speech enhancement to the language modeling. Two different types of beamforming are used to combine multi-microphone signals to obtain a single higher quality signal. Beamformed signal is further processed by a single-channel bi-directional long short-term memory (LSTM) enhancement network which is used to extract stacked mel-frequency cepstral coefficients (MFCC) features. In addition, two proposed noise-robust feature extraction methods are used with the beamformed signal. The features are used for decoding in speech recognition systems with deep neural network (DNN) based acoustic models and large-scale RNN language models to achieve high recognition accuracy in noisy environments. Our training methodology includes data augmentation and speaker adaptive training, whereas at test time model combination is used to improve generalization. Results on the CHiME-3 benchmark show that the full cadre of techniques substantially reduced the word error rate (WER). Combining hypotheses from different robust-feature systems ultimately achieved 9.10% WER for the real test data, a 72.4% reduction relative to the baseline of 32.99% WER.

本文言語English
ホスト出版物のタイトル2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ475-481
ページ数7
ISBN(電子版)9781479972913
DOI
出版ステータスPublished - 2016 2 10
外部発表はい
イベントIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Scottsdale, United States
継続期間: 2015 12 132015 12 17

Other

OtherIEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
国/地域United States
CityScottsdale
Period15/12/1315/12/17

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ネットワークおよび通信
  • コンピュータ ビジョンおよびパターン認識

フィンガープリント

「The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル