Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization

Masakiyo Fujimoto, Shinji Watanabe, Tomohiro Nakatani

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.

Original languageEnglish
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
Pages3102-3105
Number of pages4
Publication statusPublished - 2010
Externally publishedYes
Event11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010 - Makuhari, Chiba
Duration: 2010 Sep 262010 Sep 30

Other

Other11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010
CityMakuhari, Chiba
Period10/9/2610/9/30

Fingerprint

Weights and Measures
Noise
Information Storage and Retrieval
Normal Distribution
Normalization
Gm(m)

Keywords

  • Gaussian pruning
  • Gaussian weight normalization
  • Switching Kalman filter
  • Voice activity detection

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing

Cite this

Fujimoto, M., Watanabe, S., & Nakatani, T. (2010). Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 (pp. 3102-3105)

Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. / Fujimoto, Masakiyo; Watanabe, Shinji; Nakatani, Tomohiro.

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 3102-3105.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fujimoto, M, Watanabe, S & Nakatani, T 2010, Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. pp. 3102-3105, 11th Annual Conference of the International Speech Communication Association: Spoken Language Processing for All, INTERSPEECH 2010, Makuhari, Chiba, 10/9/26.
Fujimoto M, Watanabe S, Nakatani T. Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. In Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. p. 3102-3105
Fujimoto, Masakiyo ; Watanabe, Shinji ; Nakatani, Tomohiro. / Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization. Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010. 2010. pp. 3102-3105
@inproceedings{b65c9d7bc4bb461b917c5cc4c71c21ec,
title = "Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization",
abstract = "This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.",
keywords = "Gaussian pruning, Gaussian weight normalization, Switching Kalman filter, Voice activity detection",
author = "Masakiyo Fujimoto and Shinji Watanabe and Tomohiro Nakatani",
year = "2010",
language = "English",
pages = "3102--3105",
booktitle = "Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010",

}

TY - GEN

T1 - Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization

AU - Fujimoto, Masakiyo

AU - Watanabe, Shinji

AU - Nakatani, Tomohiro

PY - 2010

Y1 - 2010

N2 - This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.

AB - This paper proposes a frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection (VAD). Our previous work, switching Kalman filter-based VAD, sequentially estimates a non-stationary noise Gaussian mixture model (GMM) and constructs GMMs of observed noisy speech signals by composing pre-trained silence and clean GMMs and sequentially estimated noise GMMs. However, the composed models are not optimal, because they do not fully reflect the characteristics of the observed signal. Thus, to ensure the optimality of the composed models, we investigate a method for re-estimating the composed model. Since our VAD method works under the frame-wise sequential processing, there are insufficient re-training data for re-estimation of whole model parameters. To solve this problem, we propose a model re-estimation method that involves the extraction of reliable information using Gaussian pruning with weight normalization. Namely, the proposed method re-estimates the model by pruning non-dominant Gaussian distributions in expressing the local characteristics of each frame and by normalizing Gaussian weights of remaining distributions.

KW - Gaussian pruning

KW - Gaussian weight normalization

KW - Switching Kalman filter

KW - Voice activity detection

UR - http://www.scopus.com/inward/record.url?scp=79959857741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79959857741&partnerID=8YFLogxK

M3 - Conference contribution

SP - 3102

EP - 3105

BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

ER -