Speech recognition for a humanoid with motor noise utilizing missing feature theory

Yoshitaka Nishimura, Mitsuru Ishizuka, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Citations (Scopus)

Abstract

Automatic speech recognition (ASR) is essential for a human-humanoid communication. One of the main problems with ASR is that a humanoid inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 dB). However, it is possible to estimate these noises by using information about the humanoid's own motions and gestures. In this paper we propose a method to improve ASR for a humanoid with motor noises by utilizing the information about the humanoid's motions/gestures. The method consists of psychologically-inspired noise suppression and missing-feature-theory-based ASR (MFT-ASR). The proposed noise suppression technique adds white noise after noise suppression which does not improve SNR, but it is suitable for MFT-ASR. This is inspired by the fact that noise addition sometimes helps human perception as described in Gestalt psychology. MFT-ASR improves ASR by masking unreliable acoustic features in the input sound. The information obtained on motion/gesture is used for estimating reliability of acoustic features in MFT-ASR. We evaluated the proposed method with noisy speech recorded by Honda ASIMO in a room with reverberation. The noise data contained 32 kinds of noises: motor noises without motions, gesture noises, walking noises, and so on. The experimental results show that the proposed method outperforms the conventional multi-condition training technique.

Original languageEnglish
Title of host publicationProceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS
Pages26-33
Number of pages8
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS - Genoa
Duration: 2006 Dec 42006 Dec 6

Other

Other2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS
CityGenoa
Period06/12/406/12/6

Fingerprint

Speech recognition
Acoustic noise
Microphones
Signal to noise ratio
Speech intelligibility
Reverberation
White noise
Acoustics
Acoustic waves
Communication

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Electrical and Electronic Engineering

Cite this

Nishimura, Y., Ishizuka, M., Nakadai, K., Nakano, M., & Tsujino, H. (2006). Speech recognition for a humanoid with motor noise utilizing missing feature theory. In Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS (pp. 26-33). [4115576] https://doi.org/10.1109/ICHR.2006.321359

Speech recognition for a humanoid with motor noise utilizing missing feature theory. / Nishimura, Yoshitaka; Ishizuka, Mitsuru; Nakadai, Kazuhiro; Nakano, Mikio; Tsujino, Hiroshi.

Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS. 2006. p. 26-33 4115576.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nishimura, Y, Ishizuka, M, Nakadai, K, Nakano, M & Tsujino, H 2006, Speech recognition for a humanoid with motor noise utilizing missing feature theory. in Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS., 4115576, pp. 26-33, 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS, Genoa, 06/12/4. https://doi.org/10.1109/ICHR.2006.321359
Nishimura Y, Ishizuka M, Nakadai K, Nakano M, Tsujino H. Speech recognition for a humanoid with motor noise utilizing missing feature theory. In Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS. 2006. p. 26-33. 4115576 https://doi.org/10.1109/ICHR.2006.321359
Nishimura, Yoshitaka ; Ishizuka, Mitsuru ; Nakadai, Kazuhiro ; Nakano, Mikio ; Tsujino, Hiroshi. / Speech recognition for a humanoid with motor noise utilizing missing feature theory. Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS. 2006. pp. 26-33
@inproceedings{ec8c2f47f16e40a48fbe2516f21b372e,
title = "Speech recognition for a humanoid with motor noise utilizing missing feature theory",
abstract = "Automatic speech recognition (ASR) is essential for a human-humanoid communication. One of the main problems with ASR is that a humanoid inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 dB). However, it is possible to estimate these noises by using information about the humanoid's own motions and gestures. In this paper we propose a method to improve ASR for a humanoid with motor noises by utilizing the information about the humanoid's motions/gestures. The method consists of psychologically-inspired noise suppression and missing-feature-theory-based ASR (MFT-ASR). The proposed noise suppression technique adds white noise after noise suppression which does not improve SNR, but it is suitable for MFT-ASR. This is inspired by the fact that noise addition sometimes helps human perception as described in Gestalt psychology. MFT-ASR improves ASR by masking unreliable acoustic features in the input sound. The information obtained on motion/gesture is used for estimating reliability of acoustic features in MFT-ASR. We evaluated the proposed method with noisy speech recorded by Honda ASIMO in a room with reverberation. The noise data contained 32 kinds of noises: motor noises without motions, gesture noises, walking noises, and so on. The experimental results show that the proposed method outperforms the conventional multi-condition training technique.",
author = "Yoshitaka Nishimura and Mitsuru Ishizuka and Kazuhiro Nakadai and Mikio Nakano and Hiroshi Tsujino",
year = "2006",
doi = "10.1109/ICHR.2006.321359",
language = "English",
isbn = "142440200X",
pages = "26--33",
booktitle = "Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS",

}

TY - GEN

T1 - Speech recognition for a humanoid with motor noise utilizing missing feature theory

AU - Nishimura, Yoshitaka

AU - Ishizuka, Mitsuru

AU - Nakadai, Kazuhiro

AU - Nakano, Mikio

AU - Tsujino, Hiroshi

PY - 2006

Y1 - 2006

N2 - Automatic speech recognition (ASR) is essential for a human-humanoid communication. One of the main problems with ASR is that a humanoid inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 dB). However, it is possible to estimate these noises by using information about the humanoid's own motions and gestures. In this paper we propose a method to improve ASR for a humanoid with motor noises by utilizing the information about the humanoid's motions/gestures. The method consists of psychologically-inspired noise suppression and missing-feature-theory-based ASR (MFT-ASR). The proposed noise suppression technique adds white noise after noise suppression which does not improve SNR, but it is suitable for MFT-ASR. This is inspired by the fact that noise addition sometimes helps human perception as described in Gestalt psychology. MFT-ASR improves ASR by masking unreliable acoustic features in the input sound. The information obtained on motion/gesture is used for estimating reliability of acoustic features in MFT-ASR. We evaluated the proposed method with noisy speech recorded by Honda ASIMO in a room with reverberation. The noise data contained 32 kinds of noises: motor noises without motions, gesture noises, walking noises, and so on. The experimental results show that the proposed method outperforms the conventional multi-condition training technique.

AB - Automatic speech recognition (ASR) is essential for a human-humanoid communication. One of the main problems with ASR is that a humanoid inevitably generates motor noises. These noises are easily captured by the humanoid's microphones because the noise sources are closer to the microphones than the target speech source. Thus, the signal-to-noise ratio (SNR) of input speech becomes quite low (sometimes less than 0 dB). However, it is possible to estimate these noises by using information about the humanoid's own motions and gestures. In this paper we propose a method to improve ASR for a humanoid with motor noises by utilizing the information about the humanoid's motions/gestures. The method consists of psychologically-inspired noise suppression and missing-feature-theory-based ASR (MFT-ASR). The proposed noise suppression technique adds white noise after noise suppression which does not improve SNR, but it is suitable for MFT-ASR. This is inspired by the fact that noise addition sometimes helps human perception as described in Gestalt psychology. MFT-ASR improves ASR by masking unreliable acoustic features in the input sound. The information obtained on motion/gesture is used for estimating reliability of acoustic features in MFT-ASR. We evaluated the proposed method with noisy speech recorded by Honda ASIMO in a room with reverberation. The noise data contained 32 kinds of noises: motor noises without motions, gesture noises, walking noises, and so on. The experimental results show that the proposed method outperforms the conventional multi-condition training technique.

UR - http://www.scopus.com/inward/record.url?scp=48149111531&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=48149111531&partnerID=8YFLogxK

U2 - 10.1109/ICHR.2006.321359

DO - 10.1109/ICHR.2006.321359

M3 - Conference contribution

AN - SCOPUS:48149111531

SN - 142440200X

SN - 9781424402007

SP - 26

EP - 33

BT - Proceedings of the 2006 6th IEEE-RAS International Conference on Humanoid Robots, HUMANOIDS

ER -