Leveraging end-to-end ASR for endangered language documentation: An empirical study on yoloxóchitl mixtec

Jiatong Shi, Jonathan D. Amith, Rey Castillo García, Esteban Guadalupe Sierra, Kevin Duh, Shinji Watanabe

研究成果: Conference contribution

抄録

“Transcription bottlenecks”, created by a shortage of effective human transcribers are one of the main challenges to endangered language (EL) documentation. Automatic speech recognition (ASR) has been suggested as a tool to overcome such bottlenecks. Following this suggestion, we investigated the effectiveness for EL documentation of end-to-end ASR, which unlike Hidden Markov Model ASR systems, eschews linguistic resources but is instead more dependent on large-data settings. We open source a Yoloxóchitl Mixtec EL corpus. First, we review our method in building an end-to-end ASR system in a way that would be reproducible by the ASR community. We then propose a novice transcription correction task and demonstrate how ASR systems and novice transcribers can work together to improve EL documentation. We believe this combinatory methodology would mitigate the transcription bottleneck and transcriber shortage that hinders EL documentation.

本文言語English
ホスト出版物のタイトルEACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
出版社Association for Computational Linguistics (ACL)
ページ1134-1145
ページ数12
ISBN(電子版)9781954085022
出版ステータスPublished - 2021
外部発表はい
イベント16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021 - Virtual, Online
継続期間: 2021 4 192021 4 23

出版物シリーズ

名前EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference16th Conference of the European Chapter of the Associationfor Computational Linguistics, EACL 2021
CityVirtual, Online
Period21/4/1921/4/23

ASJC Scopus subject areas

  • ソフトウェア
  • 計算理論と計算数学
  • 言語学および言語

フィンガープリント

「Leveraging end-to-end ASR for endangered language documentation: An empirical study on yoloxóchitl mixtec」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル