Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives

Aswin Shanmugam Subramanian, Chao Weng, Meng Yu, Shi Xiong Zhang, Yong Xu, Shinji Watanabe, Dong Yu

研究成果: Conference contribution

14 被引用数 (Scopus)

抄録

Target speech extraction is a specific case of source separation where an auxiliary information like the location or some pre-saved anchor speech examples of the target speaker is used to resolve the permutation ambiguity. Traditionally such systems are optimized based on signal reconstruction objectives. Recently end-to-end automatic speech recognition (ASR) methods have enabled to optimize source separation systems with only the transcription based objective. This paper proposes a method to jointly optimize a location guided target speech extraction module along with a speech recognition module only with ASR error minimization criteria. Experimental comparisons with corresponding conventional pipeline systems verify that this task can be realized by end-to-end ASR training objectives without using parallel clean data. We show promising target speech recognition results in mixtures of two speakers and noise, and discuss interesting properties of the proposed system in terms of speech enhancement/separation objectives and word error rates. Finally, we design a system that can take both location and anchor speech as input at the same time and show that the performance can be further improved.

本文言語English
ホスト出版物のタイトル2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ7299-7303
ページ数5
ISBN(電子版)9781509066315
DOI
出版ステータスPublished - 2020 5
外部発表はい
イベント2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Barcelona, Spain
継続期間: 2020 5 42020 5 8

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2020-May
ISSN(印刷版)1520-6149

Conference

Conference2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
国/地域Spain
CityBarcelona
Period20/5/420/5/8

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル