To solve the speaker independent emotion recognition problem, a multi-level speech emotion recognition system is proposed to classify 6 speech emotions, including sadness, anger, surprise, fear, happiness and disgust from coarse to fine. The key is that the emotions divided by each layer are closely related to the emotional features of speech. For each level, appropriate features are selected from 288 candidate features by Fisher ratio which is also regarded as input parameter for the training of support vector machine (SVM). Based on Beihang emotional speech database and Berlin emotional speech database, principal component analysis (PCA) for dimension reduction and Artificial Neural Network (ANN) for classification are adopted to design 4 comparative experiments, including Fisher+SVM, PCA+SVM, Fisher+ANN, PCA+ANN. The experimental results prove that Fisher rule is better than PCA for dimension reduction, and SVM is more expansible than ANN for speaker independent speech emotion recognition. Good cross-cultural adaptation can be inferred from the similar results of experiments based on two different databases.
|ジャーナル||Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence|
|出版物ステータス||Published - 2012 8|
ASJC Scopus subject areas
- Artificial Intelligence
- Computer Vision and Pattern Recognition