This paper describes a method for automatically detecting filled (vocalized) pauses, which are one of the hesitation phenomena that current speech recognizers typically cannot handle. The detection of these pauses is important in spontaneous speech dialogue systems because they play valuable roles, such as helping a speaker keep a conversational turn, in oral communication. Although a few speech recognition systems have processed filled pauses within subword-based connected word recognition or word-spotting frameworks, they did not detect the pauses individually and consequently could not consider their roles. In this paper we propose a method that detects filled pauses and word lengthening on the basis of small fundamental frequency transition and small spectral envelope deformation under the assumption that speakers do not change articulator parameters during filled pauses. Experimental results for a Japanese spoken dialogue corpus show that our real-time filled-pause-detection system yielded a recall rate of 84.9% and a precision rate of 91.5%.
|出版ステータス||Published - 1999|
|イベント||6th European Conference on Speech Communication and Technology, EUROSPEECH 1999 - Budapest, Hungary|
継続期間: 1999 9月 5 → 1999 9月 9
|Conference||6th European Conference on Speech Communication and Technology, EUROSPEECH 1999|
|Period||99/9/5 → 99/9/9|
ASJC Scopus subject areas
- コンピュータ サイエンスの応用