Arabic speech recognition by end-to-end, modular systems and human

Amir Hussein*, Shinji Watanabe, Ahmed Ali

*この研究の対応する著者

研究成果: Article査読

抄録

Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM–DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM–DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5%, 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.5% on average.

本文言語English
論文番号101272
ジャーナルComputer Speech and Language
71
DOI
出版ステータスPublished - 2022 1
外部発表はい

ASJC Scopus subject areas

  • ソフトウェア
  • 理論的コンピュータサイエンス
  • 人間とコンピュータの相互作用

フィンガープリント

「Arabic speech recognition by end-to-end, modular systems and human」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル