A speaker diarization system with robust speaker localization and voice activity detection

Yangyang Huang, Takuma Otsuka, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)

Abstract

In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data - who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.

Original languageEnglish
Title of host publicationStudies in Computational Intelligence
Pages77-82
Number of pages6
Volume489
DOIs
Publication statusPublished - 2013
Externally publishedYes

Publication series

NameStudies in Computational Intelligence
Volume489
ISSN (Print)1860949X

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Huang, Y., Otsuka, T., & Okuno, H. G. (2013). A speaker diarization system with robust speaker localization and voice activity detection. In Studies in Computational Intelligence (Vol. 489, pp. 77-82). (Studies in Computational Intelligence; Vol. 489). https://doi.org/10.1007/978-3-319-00651-2-11