Text-Only Domain Adaptation Based on Intermediate CTC

Hiroaki Sato*, Tomoyasu Komori, Takeshi Mishima, Yoshihiko Kawai, Takahiro Mochizuki, Shoei Sato, Tetsuji Ogawa

*この研究の対応する著者

研究成果: Conference article査読

抄録

We propose a domain adaptation method that enables connectionist temporal classification (CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a target domain using unpaired text data. The performance of ASR models deteriorates for words and topics not present in the training data, such as the latest news. Although it is difficult to collect paired speech and text data for such subjects, unpaired text data is relatively easy to obtain. Therefore, a domain adaptation method using unpaired text data is proposed for the E2E ASR model based on the intermediate CTC. This model introduces an adaptation branch to embed acoustic and linguistic information in the same latent space, allowing for domain adaptation using unpaired text data of the target domain. Experimental comparisons for multiple out-of-domain settings demonstrate that the proposed text-only domain adaptation achieves a comparable or better performance than the existing shallow-fusion-based domain adaptation, and further performance improvement is achieved by integration with shallow fusion.

本文言語English
ページ(範囲)2208-2212
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOI
出版ステータスPublished - 2022
イベント23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
継続期間: 2022 9月 182022 9月 22

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Text-Only Domain Adaptation Based on Intermediate CTC」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル