Performance Improvement of Speech Emotion Recognition by Neutral Speech Detection Using Autoencoder and Intermediate Representation

Jennifer Santoso, Takeshi Yamada, Kenkichi Ishizuka, Taiichi Hashimoto, Shoji Makino

研究成果: Conference article査読

1 被引用数 (Scopus)

抄録

In recent years, classification-based speech emotion recognition (SER) methods have achieved high overall performance. However, these methods tend to have lower performance for neutral speeches, which account for a large proportion in most practical situations. To solve the problem and improve the SER performance, we propose a neutral speech detector (NSD) based on the anomaly detection approach, which uses an autoencoder, the intermediate layer output of a pretrained SER classifier and only neutral data for training. The intermediate layer output of a pretrained SER classifier enables the reconstruction of both acoustic and text features, which are optimized for SER tasks. We then propose the combination of the SER classifier and the NSD used as a screening mechanism for correcting the class probability of the incorrectly recognized neutral speeches. Results of our experiment using the IEMOCAP dataset indicate that the NSD can reconstruct both the acoustic and textual features, achieving a satisfactory performance for use as a reliable screening method. Furthermore, we evaluated the performance of our proposed screening mechanism, and our experiments show significant improvement of 12.9% in the F-score of the neutral class to 80.3%, and 8.4% in the class-average weighted accuracy to 84.5% compared with state-of-the-art SER classifiers.

本文言語English
ページ(範囲)4700-4704
ページ数5
ジャーナルProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
2022-September
DOI
出版ステータスPublished - 2022
イベント23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of
継続期間: 2022 9月 182022 9月 22

ASJC Scopus subject areas

  • 言語および言語学
  • 人間とコンピュータの相互作用
  • 信号処理
  • ソフトウェア
  • モデリングとシミュレーション

フィンガープリント

「Performance Improvement of Speech Emotion Recognition by Neutral Speech Detection Using Autoencoder and Intermediate Representation」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル