TY - JOUR
T1 - Online unsupervised classification with model comparison in the variational bayes framework for voice activity detection
AU - Cournapeau, David
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
AU - Kawahara, Tatsuya
PY - 2010/12
Y1 - 2010/12
N2 - A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.
AB - A new online, unsupervised method for Voice Activity Detection (VAD) is proposed. The conventional VAD methods often rely on heuristics to adapt the decision threshold to the estimated SNR. The proposed VAD method is based on the Variational Bayes (VB) approach to the online Expectation Maximization (EM), so that it can automatically adapt the decision level and the statistical model at the same time. We consider two parallel classifiers, one for the noise-only case, and the other for speech-and-noise case. Both models are trained concurrently and online using the VB framework. The VB framework also provides an explicit approximation of the log evidence called free energy. It is used to assess the reliability of the classifier in an online fashion, and to decide which model is more appropriate at a given time frame. Experimental evaluations were conducted on the CENSREC-1-C database designed for VAD evaluations. With the effect of the model comparison, the proposed scheme outperforms the conventional VAD algorithms, especially in the remote recording condition. It is also shown to be more robust with respect to changes of the noise type.
KW - Sequential estimation
KW - speech analysis
KW - variational Bayes (VB)
KW - voice activity detection (VAD)
UR - http://www.scopus.com/inward/record.url?scp=78649271854&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78649271854&partnerID=8YFLogxK
U2 - 10.1109/JSTSP.2010.2080821
DO - 10.1109/JSTSP.2010.2080821
M3 - Article
AN - SCOPUS:78649271854
VL - 4
SP - 1071
EP - 1083
JO - IEEE Journal on Selected Topics in Signal Processing
JF - IEEE Journal on Selected Topics in Signal Processing
SN - 1932-4553
IS - 6
M1 - 5586640
ER -