Two-channel-based voice activity detection for humanoid robots in noisy home environments

Hyun Don Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.

Original languageEnglish
Title of host publicationProceedings - IEEE International Conference on Robotics and Automation
Pages3495-3501
Number of pages7
DOIs
Publication statusPublished - 2008
Externally publishedYes
Event2008 IEEE International Conference on Robotics and Automation, ICRA 2008 - Pasadena, CA
Duration: 2008 May 192008 May 23

Other

Other2008 IEEE International Conference on Robotics and Automation, ICRA 2008
CityPasadena, CA
Period08/5/1908/5/23

Fingerprint

Robots
Microphones
Acoustic waves
Power spectrum
Television
Speech recognition
Acoustic noise

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering

Cite this

Kim, H. D., Komatani, K., Ogata, T., & Okuno, H. G. (2008). Two-channel-based voice activity detection for humanoid robots in noisy home environments. In Proceedings - IEEE International Conference on Robotics and Automation (pp. 3495-3501). [4543745] https://doi.org/10.1109/ROBOT.2008.4543745

Two-channel-based voice activity detection for humanoid robots in noisy home environments. / Kim, Hyun Don; Komatani, Kazunori; Ogata, Tetsuya; Okuno, Hiroshi G.

Proceedings - IEEE International Conference on Robotics and Automation. 2008. p. 3495-3501 4543745.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, HD, Komatani, K, Ogata, T & Okuno, HG 2008, Two-channel-based voice activity detection for humanoid robots in noisy home environments. in Proceedings - IEEE International Conference on Robotics and Automation., 4543745, pp. 3495-3501, 2008 IEEE International Conference on Robotics and Automation, ICRA 2008, Pasadena, CA, 08/5/19. https://doi.org/10.1109/ROBOT.2008.4543745
Kim HD, Komatani K, Ogata T, Okuno HG. Two-channel-based voice activity detection for humanoid robots in noisy home environments. In Proceedings - IEEE International Conference on Robotics and Automation. 2008. p. 3495-3501. 4543745 https://doi.org/10.1109/ROBOT.2008.4543745
Kim, Hyun Don ; Komatani, Kazunori ; Ogata, Tetsuya ; Okuno, Hiroshi G. / Two-channel-based voice activity detection for humanoid robots in noisy home environments. Proceedings - IEEE International Conference on Robotics and Automation. 2008. pp. 3495-3501
@inproceedings{89b8cfdcd68b49b6935be6908b365201,
title = "Two-channel-based voice activity detection for humanoid robots in noisy home environments",
abstract = "The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.",
author = "Kim, {Hyun Don} and Kazunori Komatani and Tetsuya Ogata and Okuno, {Hiroshi G.}",
year = "2008",
doi = "10.1109/ROBOT.2008.4543745",
language = "English",
isbn = "9781424416479",
pages = "3495--3501",
booktitle = "Proceedings - IEEE International Conference on Robotics and Automation",

}

TY - GEN

T1 - Two-channel-based voice activity detection for humanoid robots in noisy home environments

AU - Kim, Hyun Don

AU - Komatani, Kazunori

AU - Ogata, Tetsuya

AU - Okuno, Hiroshi G.

PY - 2008

Y1 - 2008

N2 - The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.

AB - The purpose of this research is to accurately classify the speech signals originating from the front even in noisy home environments. This ability can help robots to improve speech recognition and to spot keywords. We therefore developed a new voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method. It can classify the speech signals that are received at the front of two microphones by comparing the spectral energy of observed signals with that of target signals estimated by CSCC. Also, it can work in real time without training filter coefficients beforehand even in noisy environments (SNR > 0 dB) and can cope with speech noises generated by audio-visual equipments such as televisions and audio devices. Since the CSCC method requires the directions of the noise signals, we also developed a sound source localization system integrated with cross-power spectrum phase (CSP) analysis and an expectation-maximization (EM) algorithm. This system was demonstrated to enable a robot to cope with multiple sound sources using two microphones.

UR - http://www.scopus.com/inward/record.url?scp=51649123542&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51649123542&partnerID=8YFLogxK

U2 - 10.1109/ROBOT.2008.4543745

DO - 10.1109/ROBOT.2008.4543745

M3 - Conference contribution

SN - 9781424416479

SP - 3495

EP - 3501

BT - Proceedings - IEEE International Conference on Robotics and Automation

ER -