TY - GEN
T1 - Gated convolutional neural network-based voice activity detection under high-level noise environments
AU - Li, Li
AU - Yamaoka, Kouei
AU - Koshino, Yuki
AU - Matsumoto, Mitsuo
AU - Makino, Shoji
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number 19H04131, SECOM Science and Technology Foundation, and Strategic Core Technology Advancement Program (Supporting Industry Program).
Publisher Copyright:
© 2019 Proceedings of the International Congress on Acoustics. All rights reserved.
PY - 2019
Y1 - 2019
N2 - This paper deals with voice activity detection (VAD) tasks under high-level noise environments where signal-to-noise ratios (SNRs) are lower than -5 dB. With the increasing needs for hands-free applications, it is unavoidable to face critically low SNR situations where the noise can be internal self-created ego noise or external noise occurring in the environment, e.g., rescue robots in a disaster or navigation in a high-speed moving car. To achieve accurate VAD results under such situations, this paper proposes a gated convolutional neural network-based approach that is able to capture long- and short-term dependencies in time series as cues for detection. Experimental evaluations using high-level ego noise of a hose-shaped rescue robot revealed that the proposed method was able to averagely achieve about 86% VAD accuracy in environments with SNR in the range of -30 dB to -5 dB.
AB - This paper deals with voice activity detection (VAD) tasks under high-level noise environments where signal-to-noise ratios (SNRs) are lower than -5 dB. With the increasing needs for hands-free applications, it is unavoidable to face critically low SNR situations where the noise can be internal self-created ego noise or external noise occurring in the environment, e.g., rescue robots in a disaster or navigation in a high-speed moving car. To achieve accurate VAD results under such situations, this paper proposes a gated convolutional neural network-based approach that is able to capture long- and short-term dependencies in time series as cues for detection. Experimental evaluations using high-level ego noise of a hose-shaped rescue robot revealed that the proposed method was able to averagely achieve about 86% VAD accuracy in environments with SNR in the range of -30 dB to -5 dB.
KW - Ego noise
KW - Gated convolutional neural networks
KW - Low SNR
KW - Rescue robot
KW - Voice activity detection (VAD)
UR - http://www.scopus.com/inward/record.url?scp=85099330420&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099330420&partnerID=8YFLogxK
U2 - 10.18154/RWTH-CONV-239667
DO - 10.18154/RWTH-CONV-239667
M3 - Conference contribution
AN - SCOPUS:85099330420
T3 - Proceedings of the International Congress on Acoustics
SP - 2862
EP - 2869
BT - Proceedings of the 23rd International Congress on Acoustics
A2 - Ochmann, Martin
A2 - Michael, Vorlander
A2 - Fels, Janina
PB - International Commission for Acoustics (ICA)
T2 - 23rd International Congress on Acoustics: Integrating 4th EAA Euroregio, ICA 2019
Y2 - 9 September 2019 through 23 September 2019
ER -