An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

Huaibo Zhao*, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

*この研究の対応する著者

研究成果: Conference contribution

抄録

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term cons), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

本文言語English
ホスト出版物のタイトル2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ477-483
ページ数7
ISBN(電子版)9789881476890
出版ステータスPublished - 2021
イベント2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
継続期間: 2021 12月 142021 12月 17

出版物シリーズ

名前2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
国/地域Japan
CityTokyo
Period21/12/1421/12/17

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ビジョンおよびパターン認識
  • 信号処理
  • 器械工学

フィンガープリント

「An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル