NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING

Motoi Omachi, Yuya Fujita, Shinji Watanabe, Tianzi Wang

研究成果: Conference contribution

抄録

We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech recognition (ASR) and downstream natural language processing (NLP) simultaneously. The proposed approach predicts a single-aligned sequence of transcriptions and linguistic annotations such as part-of-speech (POS) tags and named entity (NE) tags from speech. We use non-autoregressive (NAR) decoding instead of autoregressive (AR) decoding to reduce execution time since NAR can output multiple tokens in parallel across time. We use the connectionist temporal classification (CTC) model with mask-predict, i.e., Mask-CTC, to predict the single-aligned sequence accurately. Mask-CTC improves performance by joint training of CTC and a conditioned masked language model and refining output tokens with low confidence conditioned on reliable output tokens and audio embeddings. The proposed method jointly performs the ASR and downstream NLP task, i.e., POS or NE tagging, in a NAR manner. Experiments using the Corpus of Spontaneous Japanese and Spoken Language Understanding Resource Package show that the proposed E2E model can predict transcriptions and linguistic annotations with consistently better performance than vanilla CTC using greedy decoding and 15-97x faster than Transformer-based AR model.

本文言語English
ホスト出版物のタイトル2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ6772-6776
ページ数5
ISBN(電子版)9781665405409
DOI
出版ステータスPublished - 2022
外部発表はい
イベント47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapore
継続期間: 2022 5月 232022 5月 27

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2022-May
ISSN(印刷版)1520-6149

Conference

Conference47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022
国/地域Singapore
CityVirtual, Online
Period22/5/2322/5/27

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「NON-AUTOREGRESSIVE END-TO-END AUTOMATIC SPEECH RECOGNITION INCORPORATING DOWNSTREAM NATURAL LANGUAGE PROCESSING」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル