Speaker indexing and speech enhancement in real meetings/conversations

Shoko Araki, Masakiyo Fujimoto, Kentaro Ishizuka, Hiroshi Sawada, Shoji Makino

研究成果: Conference contribution

19 被引用数 (Scopus)

抄録

This paper presents a speaker indexing method that uses a small number of microphones to estimate who spoke when. Our proposed speaker indexing is realized by using a noise robust voice activity detector (VAD), a GCC-PHAT based direction of arrival (DOA) estimator, and a DOA classifier. Using the estimated speaker indexing information, we can also enhance the utterances of each speaker with a maximum signal-to-noise-ratio (MaxSNR) beamformer. This paper applies our system to real recorded meetings / conversations recorded in a room with a reverberation time of 350 ms, and evaluates the performance by a standard measure: the diarization error rate (DER). Even for the real conversations, which have many speaker turn-takings and overlaps, the speaker error time was very small with our proposed system. We are planning to demonstrate a real-time speaker indexing system at ICASSP2008.

本文言語English
ホスト出版物のタイトル2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
ページ93-96
ページ数4
DOI
出版ステータスPublished - 2008
外部発表はい
イベント2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
継続期間: 2008 3 312008 4 4

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN(印刷版)1520-6149

Conference

Conference2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
国/地域United States
CityLas Vegas, NV
Period08/3/3108/4/4

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Speaker indexing and speech enhancement in real meetings/conversations」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル