Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection

Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

This paper proposes a robust voice activity detection (VAD) method that operates in the presence of noise. For noise robust VAD, we have already proposed statistical models and a switching Kalman filter (SKF)-based technique. In this paper, we focus on a model re-estimation method using Gaussian pruning with weight normalization. The statistical model for SKF-based VAD is constructed using Gaussian mixture models (GMMs), and consists of pre-trained silence and clean speech GMMs and a sequentially estimated noise GMM. However, the composed model is not optimal in that it does not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed model, we investigate a method for its re-estimation that reflects the characteristics of the observed signal sequence. Since our VAD method works through the use of frame-wise sequential processing, processing with the smallest latency is very important. In this case, there are insufficient re-training data for a re-estimation of all the Gaussian parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable characteristics using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions that express the local characteristics of each frame and by normalizing the Gaussian weights of the remaining distributions. In an experiment using a speech corpus for VAD evaluation, CENSREC-1-C, the proposed method significantly improved the VAD performance with compared that of the original SKF-based VAD. This result confirmed that the proposed Gaussian pruning contributes to an improvement in VAD accuracy.

Original languageEnglish
Pages (from-to)229-244
Number of pages16
JournalSpeech Communication
Volume54
Issue number2
DOIs
Publication statusPublished - 2012 Feb
Externally publishedYes

Fingerprint

Voice Activity Detection
normalization
Pruning
Normalization
Gaussian Mixture Model
Kalman Filter
Kalman filters
Model
Statistical Model
Gaussian distribution
Latency
Optimality
Processing
Express

Keywords

  • Gaussian pruning
  • Gaussian weight normalization
  • Posterior probability
  • Switching Kalman filter
  • Voice activity detection

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Communication
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Modelling and Simulation

Cite this

Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection. / Fujimoto, Masakiyo; Watanabe, Shinji; Nakatani, Tomohiro.

In: Speech Communication, Vol. 54, No. 2, 02.2012, p. 229-244.

Research output: Contribution to journalArticle

@article{60945d943b5d44b995d2affc2cb9b1e7,
title = "Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection",
abstract = "This paper proposes a robust voice activity detection (VAD) method that operates in the presence of noise. For noise robust VAD, we have already proposed statistical models and a switching Kalman filter (SKF)-based technique. In this paper, we focus on a model re-estimation method using Gaussian pruning with weight normalization. The statistical model for SKF-based VAD is constructed using Gaussian mixture models (GMMs), and consists of pre-trained silence and clean speech GMMs and a sequentially estimated noise GMM. However, the composed model is not optimal in that it does not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed model, we investigate a method for its re-estimation that reflects the characteristics of the observed signal sequence. Since our VAD method works through the use of frame-wise sequential processing, processing with the smallest latency is very important. In this case, there are insufficient re-training data for a re-estimation of all the Gaussian parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable characteristics using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions that express the local characteristics of each frame and by normalizing the Gaussian weights of the remaining distributions. In an experiment using a speech corpus for VAD evaluation, CENSREC-1-C, the proposed method significantly improved the VAD performance with compared that of the original SKF-based VAD. This result confirmed that the proposed Gaussian pruning contributes to an improvement in VAD accuracy.",
keywords = "Gaussian pruning, Gaussian weight normalization, Posterior probability, Switching Kalman filter, Voice activity detection",
author = "Masakiyo Fujimoto and Shinji Watanabe and Tomohiro Nakatani",
year = "2012",
month = "2",
doi = "10.1016/j.specom.2011.08.005",
language = "English",
volume = "54",
pages = "229--244",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "2",

}

TY - JOUR

T1 - Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection

AU - Fujimoto, Masakiyo

AU - Watanabe, Shinji

AU - Nakatani, Tomohiro

PY - 2012/2

Y1 - 2012/2

N2 - This paper proposes a robust voice activity detection (VAD) method that operates in the presence of noise. For noise robust VAD, we have already proposed statistical models and a switching Kalman filter (SKF)-based technique. In this paper, we focus on a model re-estimation method using Gaussian pruning with weight normalization. The statistical model for SKF-based VAD is constructed using Gaussian mixture models (GMMs), and consists of pre-trained silence and clean speech GMMs and a sequentially estimated noise GMM. However, the composed model is not optimal in that it does not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed model, we investigate a method for its re-estimation that reflects the characteristics of the observed signal sequence. Since our VAD method works through the use of frame-wise sequential processing, processing with the smallest latency is very important. In this case, there are insufficient re-training data for a re-estimation of all the Gaussian parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable characteristics using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions that express the local characteristics of each frame and by normalizing the Gaussian weights of the remaining distributions. In an experiment using a speech corpus for VAD evaluation, CENSREC-1-C, the proposed method significantly improved the VAD performance with compared that of the original SKF-based VAD. This result confirmed that the proposed Gaussian pruning contributes to an improvement in VAD accuracy.

AB - This paper proposes a robust voice activity detection (VAD) method that operates in the presence of noise. For noise robust VAD, we have already proposed statistical models and a switching Kalman filter (SKF)-based technique. In this paper, we focus on a model re-estimation method using Gaussian pruning with weight normalization. The statistical model for SKF-based VAD is constructed using Gaussian mixture models (GMMs), and consists of pre-trained silence and clean speech GMMs and a sequentially estimated noise GMM. However, the composed model is not optimal in that it does not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed model, we investigate a method for its re-estimation that reflects the characteristics of the observed signal sequence. Since our VAD method works through the use of frame-wise sequential processing, processing with the smallest latency is very important. In this case, there are insufficient re-training data for a re-estimation of all the Gaussian parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable characteristics using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions that express the local characteristics of each frame and by normalizing the Gaussian weights of the remaining distributions. In an experiment using a speech corpus for VAD evaluation, CENSREC-1-C, the proposed method significantly improved the VAD performance with compared that of the original SKF-based VAD. This result confirmed that the proposed Gaussian pruning contributes to an improvement in VAD accuracy.

KW - Gaussian pruning

KW - Gaussian weight normalization

KW - Posterior probability

KW - Switching Kalman filter

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=80055089790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80055089790&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2011.08.005

DO - 10.1016/j.specom.2011.08.005

M3 - Article

AN - SCOPUS:80055089790

VL - 54

SP - 229

EP - 244

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 2

ER -