Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection

David Cournapeau, Shinji Watanabe, Atsushi Nakamura, Tatsuya Kawahara

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

Original languageEnglish
Article number5586640
Pages (from-to)1071-1083
Number of pages13
JournalIEEE Journal on Selected Topics in Signal Processing
Volume4
Issue number6
DOIs
Publication statusPublished - 2010 Dec
Externally publishedYes

Fingerprint

Classifiers
Free energy
Statistical Models

Keywords

  • Sequential estimation
  • speech analysis
  • variational Bayes (VB)
  • voice activity detection (VAD)

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing

Cite this

Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection. / Cournapeau, David; Watanabe, Shinji; Nakamura, Atsushi; Kawahara, Tatsuya.

In: IEEE Journal on Selected Topics in Signal Processing, Vol. 4, No. 6, 5586640, 12.2010, p. 1071-1083.

Research output: Contribution to journalArticle

@article{5cf5591290cf422f9a627acc3cbacc5f,
title = "Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection",
abstract = "A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.",
keywords = "Sequential estimation, speech analysis, variational Bayes (VB), voice activity detection (VAD)",
author = "David Cournapeau and Shinji Watanabe and Atsushi Nakamura and Tatsuya Kawahara",
year = "2010",
month = "12",
doi = "10.1109/JSTSP.2010.2080821",
language = "English",
volume = "4",
pages = "1071--1083",
journal = "IEEE Journal on Selected Topics in Signal Processing",
issn = "1932-4553",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6",

}

TY - JOUR

T1 - Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection

AU - Cournapeau, David

AU - Watanabe, Shinji

AU - Nakamura, Atsushi

AU - Kawahara, Tatsuya

PY - 2010/12

Y1 - 2010/12

N2 - A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

AB - A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.

KW - Sequential estimation

KW - speech analysis

KW - variational Bayes (VB)

KW - voice activity detection (VAD)

UR - http://www.scopus.com/inward/record.url?scp=78649271854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649271854&partnerID=8YFLogxK

U2 - 10.1109/JSTSP.2010.2080821

DO - 10.1109/JSTSP.2010.2080821

M3 - Article

VL - 4

SP - 1071

EP - 1083

JO - IEEE Journal on Selected Topics in Signal Processing

JF - IEEE Journal on Selected Topics in Signal Processing

SN - 1932-4553

IS - 6

M1 - 5586640

ER -