Two-layered audio-visual speech recognition for robots in noisy environments

Takami Yoshida, Kazuhiro Nakadai, Hiroshi G. Okuno

研究成果: Conference contribution

7 引用 (Scopus)

抜粋

Audio-visual (AV) integration is one of the key ideas to improve perception in noisy real-world environments. This paper describes automatic speech recognition (ASR) to improve human-robot interaction based on AV integration. We developed AV-integrated ASR, which has two AV integration layers, that is, voice activity detection (VAD) and ASR. However, the system has three difficulties: 1) VAD and ASR have been separately studied although these processes are mutually dependent, 2) VAD and ASR assumed that high resolution images are available although this assumption never holds in the real world, and 3) an optimal weight between audio and visual stream was fixed while their reliabilities change according to environmental changes. To solve these problems, we propose a new VAD algorithm taking ASR characteristics into account, and a linear-regression-based optimal weight estimation method. We evaluate the algorithm for auditory-and/or visually-contaminated data. Preliminary results show that the robustness of VAD improved even when the resolution of the images is low, and the AVSR using estimated stream weight shows the effectiveness of AV integration.

元の言語English
ホスト出版物のタイトルIEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings
ページ988-993
ページ数6
DOI
出版物ステータスPublished - 2010
外部発表Yes
イベント23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Taipei
継続期間: 2010 10 182010 10 22

Other

Other23rd IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010
Taipei
期間10/10/1810/10/22

    フィンガープリント

ASJC Scopus subject areas

  • Artificial Intelligence
  • Human-Computer Interaction
  • Control and Systems Engineering

これを引用

Yoshida, T., Nakadai, K., & Okuno, H. G. (2010). Two-layered audio-visual speech recognition for robots in noisy environments. : IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010 - Conference Proceedings (pp. 988-993). [5651205] https://doi.org/10.1109/IROS.2010.5651205