Paper ID | AUD-12.3 | ||
Paper Title | UNSUPERVISED AND SEMI-SUPERVISED FEW-SHOT ACOUSTIC EVENT CLASSIFICATION | ||
Authors | Hsin-Ping Huang, University of California, Merced, United States; Krishna Puvvada, Ming Sun, Chao Wang, Amazon Alexa, United States | ||
Session | AUD-12: Detection and Classification of Acoustic Scenes and Events 1: Few-shot learning | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Few-shot Acoustic Event Classification (AEC) aims to learn a model to recognize novel acoustic events using very limited labeled data. Previous works utilize supervised pre-training as well as meta-learning approaches, which heavily rely on labeled data. Here, we study unsupervised and semi-supervised learning approaches for few-shot AEC. Our work builds upon recent advances in unsupervised representation learning introduced for speech recognition and language modeling. We learn audio representations from a large amount of unlabeled data, and use the resulting representations for few-shot AEC. We further extend our model in a semi-supervised fashion. Our unsupervised representation learning approach outperforms supervised pre-training methods, and our semi-supervised learning approach outperforms meta-learning methods for few-shot AEC. We also show that our work is more robust under domain mismatch. |