Paper ID | SPE-24.3 |
Paper Title |
MAEC: Multi-instance learning with an Adversarial Auto-encoder-based Classifier for Speech Emotion Recognition |
Authors |
Changzeng Fu, Osaka University, Japan; Chaoran Liu, Carlos Toshinori Ishi, Advanced Telecommunications Research Institute International, Japan; Hiroshi Ishiguro, Osaka University, Japan |
Session | SPE-24: Speech Emotion 2: Neural Networks for Speech Emotion Recognition |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this paper, we propose an adversarial auto-encoder-based classifier, which can regularize the distribution of latent representation to smooth the boundaries among categories. Moreover, we adopt multi-instance learning by dividing speech into a bag of segments to capture the most salient moments for presenting an emotion. The proposed model was trained on the IEMOCAP dataset and evaluated on the in-corpus validation set (IEMOCAP) and the cross-corpus validation set (MELD). The experiment results show that our model outperforms the baseline on in-corpus validation and increases the scores on cross-corpus validation with regularization. |