Paper ID | AUD-12.3 |
Paper Title |
UNSUPERVISED AND SEMI-SUPERVISED FEW-SHOT ACOUSTIC EVENT CLASSIFICATION |
Authors |
Hsin-Ping Huang, University of California, Merced, United States; Krishna Puvvada, Ming Sun, Chao Wang, Amazon Alexa, United States |
Session | AUD-12: Detection and Classification of Acoustic Scenes and Events 1: Few-shot learning |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Few-shot Acoustic Event Classification (AEC) aims to learn a model to recognize novel acoustic events using very limited labeled data. Previous works utilize supervised pre-training as well as meta-learning approaches, which heavily rely on labeled data. Here, we study unsupervised and semi-supervised learning approaches for few-shot AEC. Our work builds upon recent advances in unsupervised representation learning introduced for speech recognition and language modeling. We learn audio representations from a large amount of unlabeled data, and use the resulting representations for few-shot AEC. We further extend our model in a semi-supervised fashion. Our unsupervised representation learning approach outperforms supervised pre-training methods, and our semi-supervised learning approach outperforms meta-learning methods for few-shot AEC. We also show that our work is more robust under domain mismatch. |