Paper ID | SPE-23.2 |
Paper Title |
Progressive Co-teaching for Ambiguous Speech Emotion Recognition |
Authors |
Yifei Yin, Yu Gu, Longshan Yao, Ying Zhou, Xuefeng Liang, Xidian University, China; He Zhang, Northwest University, China |
Session | SPE-23: Speech Emotion 1: Speech Emotion Recognition |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Speech emotion recognition is a challenging task due to the ambiguity of emotion, which makes it difficult to learn the features of emotion data using machine learning algorithms. However, previous studies conventionally ignore the ambiguity of emotion and treat the emotion data as the same difficulty level, which results in low recognition accuracy. Motivated by human and animal learning studies, we propose a novel method named Progressive Co-teaching (PCT) to learn speech emotion features from simple to difficult. PCT method automatically identifies the difficulty level of data by itself using loss values, and then each network exchanges easy instances with small loss to peer network for early training. The rest instances with large loss are added gradually for later training. The experiment results demonstrate that our method achieves an improvement of 3.8% and 1.27% on two benchmark corpora (MAS and IEMOCAP) than the state-of-the-arts, respectively. |