2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Technical Program

Paper ID	SPE-24.3
Paper Title	MAEC: Multi-instance learning with an Adversarial Auto-encoder-based Classifier for Speech Emotion Recognition
Authors	Changzeng Fu, Osaka University, Japan; Chaoran Liu, Carlos Toshinori Ishi, Advanced Telecommunications Research Institute International, Japan; Hiroshi Ishiguro, Osaka University, Japan
Session	SPE-24: Speech Emotion 2: Neural Networks for Speech Emotion Recognition
Location	Gather.Town
Session Time:	Wednesday, 09 June, 15:30 - 16:15
Presentation Time:	Wednesday, 09 June, 15:30 - 16:15
Presentation	Poster
Topic	Speech Processing: [SPE-ANLS] Speech Analysis
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	In this paper, we propose an adversarial auto-encoder-based classifier, which can regularize the distribution of latent representation to smooth the boundaries among categories. Moreover, we adopt multi-instance learning by dividing speech into a bag of segments to capture the most salient moments for presenting an emotion. The proposed model was trained on the IEMOCAP dataset and evaluated on the in-corpus validation set (IEMOCAP) and the cross-corpus validation set (MELD). The experiment results show that our model outperforms the baseline on in-corpus validation and increases the scores on cross-corpus validation with regularization.