Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis

Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

研究成果: Conference contribution

6 被引用数 (Scopus)

抄録

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation. With technical advances in systems dealing with speech separation, speaker diarization, and automatic speech recognition (ASR) in the last decade, it has become possible to build pipelines that achieve reasonable error rates on this task. In this paper, we propose an end-to-end modular system for the LibriCSS meeting data, which combines independently trained separation, diarization, and recognition components, in that order. We study the effect of different state-of-the-art methods at each stage of the pipeline, and report results using task-specific metrics like SDR and DER, as well as downstream WER. Experiments indicate that the problem of overlapping speech for diarization and ASR can be effectively mitigated with the presence of a well-trained separation module. Our best system achieves a speaker-attributed WER of 12.7%, which is close to that of a non-overlapping ASR.

本文言語English
ホスト出版物のタイトル2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ897-904
ページ数8
ISBN(電子版)9781728170664
DOI
出版ステータスPublished - 2021 1 19
外部発表はい
イベント2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Virtual, Shenzhen, China
継続期間: 2021 1 192021 1 22

出版物シリーズ

名前2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings

Conference

Conference2021 IEEE Spoken Language Technology Workshop, SLT 2021
国/地域China
CityVirtual, Shenzhen
Period21/1/1921/1/22

ASJC Scopus subject areas

  • 言語学および言語
  • 言語および言語学
  • 人工知能
  • コンピュータ サイエンスの応用
  • コンピュータ ビジョンおよびパターン認識
  • ハードウェアとアーキテクチャ

フィンガープリント

「Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル