A speaker diarization system with robust speaker localization and voice activity detection

Yangyang Huang, Takuma Otsuka, Hiroshi G. Okuno

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)

Abstract

In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data - who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.

Original languageEnglish
Title of host publicationStudies in Computational Intelligence
Pages77-82
Number of pages6
Volume489
DOIs
Publication statusPublished - 2013
Externally publishedYes

Publication series

NameStudies in Computational Intelligence
Volume489
ISSN (Print)1860949X

Fingerprint

Direction of arrival
Human robot interaction

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Huang, Y., Otsuka, T., & Okuno, H. G. (2013). A speaker diarization system with robust speaker localization and voice activity detection. In Studies in Computational Intelligence (Vol. 489, pp. 77-82). (Studies in Computational Intelligence; Vol. 489). https://doi.org/10.1007/978-3-319-00651-2-11

A speaker diarization system with robust speaker localization and voice activity detection. / Huang, Yangyang; Otsuka, Takuma; Okuno, Hiroshi G.

Studies in Computational Intelligence. Vol. 489 2013. p. 77-82 (Studies in Computational Intelligence; Vol. 489).

Research output: Chapter in Book/Report/Conference proceedingChapter

Huang, Y, Otsuka, T & Okuno, HG 2013, A speaker diarization system with robust speaker localization and voice activity detection. in Studies in Computational Intelligence. vol. 489, Studies in Computational Intelligence, vol. 489, pp. 77-82. https://doi.org/10.1007/978-3-319-00651-2-11
Huang Y, Otsuka T, Okuno HG. A speaker diarization system with robust speaker localization and voice activity detection. In Studies in Computational Intelligence. Vol. 489. 2013. p. 77-82. (Studies in Computational Intelligence). https://doi.org/10.1007/978-3-319-00651-2-11
Huang, Yangyang ; Otsuka, Takuma ; Okuno, Hiroshi G. / A speaker diarization system with robust speaker localization and voice activity detection. Studies in Computational Intelligence. Vol. 489 2013. pp. 77-82 (Studies in Computational Intelligence).
@inbook{6ea94dd659654c11b233abe317b608d6,
title = "A speaker diarization system with robust speaker localization and voice activity detection",
abstract = "In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data - who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.",
author = "Yangyang Huang and Takuma Otsuka and Okuno, {Hiroshi G.}",
year = "2013",
doi = "10.1007/978-3-319-00651-2-11",
language = "English",
isbn = "9783319006505",
volume = "489",
series = "Studies in Computational Intelligence",
pages = "77--82",
booktitle = "Studies in Computational Intelligence",

}

TY - CHAP

T1 - A speaker diarization system with robust speaker localization and voice activity detection

AU - Huang, Yangyang

AU - Otsuka, Takuma

AU - Okuno, Hiroshi G.

PY - 2013

Y1 - 2013

N2 - In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data - who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.

AB - In real-world auditory scene analysis of human-robot interactions, three types of information are essential and need to be extracted from the observation data - who speaks when and where. We present a speaker diarization system that is used to accomplish the resolution. Multiple signal classification (MUSIC) is a powerful method for voice activity detection (VAD) and direction of arrival (DOA) estimation. We propose our system and compare its performance in VAD and DOA with the method based on MUSIC algorithm.

UR - http://www.scopus.com/inward/record.url?scp=84883687628&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883687628&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-00651-2-11

DO - 10.1007/978-3-319-00651-2-11

M3 - Chapter

SN - 9783319006505

VL - 489

T3 - Studies in Computational Intelligence

SP - 77

EP - 82

BT - Studies in Computational Intelligence

ER -