Speech emotion recognition based on attention weight correction using word-level confidence measure

Jennifer Santoso, Takeshi Yamada, Shoji Makino, Kenkichi Ishizuka, Takekatsu Hiramura

研究成果: Conference contribution

抄録

Emotion recognition is essential for human behavior analysis and possible through various inputs such as speech and images. However, in practical situations, such as in call center analysis, the available information is limited to speech. This leads to the study of speech emotion recognition (SER). Considering the complexity of emotions, SER is a challenging task. Recently, automatic speech recognition (ASR) has played a role in obtaining text information from speech. The combination of speech and ASR results has improved the SER performance. However, ASR results are highly affected by speech recognition errors. Although there is a method to improve ASR performance on emotional speech, it requires the fine-tuning of ASR, which is costly. To mitigate the errors in SER using ASR systems, we propose the use of the combination of a self-attention mechanism and a word-level confidence measure (CM), which indicates the reliability of ASR results, to reduce the importance of words with a high chance of error. Experimental results confirmed that the combination of self-attention mechanism and CM reduced the effects of incorrectly recognized words in ASR results, providing a better focus on words that determine emotion recognition. Our proposed method outperformed the stateof- the-art methods on the IEMOCAP dataset.

本文言語English
ホスト出版物のタイトル22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
出版社International Speech Communication Association
ページ301-305
ページ数5
ISBN(電子版)9781713836902
DOI
出版ステータスPublished - 2021
イベント22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
継続期間: 2021 8 302021 9 3

出版物シリーズ

名前Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
1
ISSN(印刷版)2308-457X
ISSN(電子版)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
国/地域Czech Republic
CityBrno
Period21/8/3021/9/3

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Speech emotion recognition based on attention weight correction using word-level confidence measure」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル