Paper ID | AUD-13.6 | ||
Paper Title | SOUND EVENT DETECTION BY CONSISTENCY TRAINING AND PSEUDO-LABELING WITH FEATURE-PYRAMID CONVOLUTIONAL RECURRENT NEURAL NETWORKS | ||
Authors | Chih-Yuan Koh, You-Siang Chen, Yi-Wen Liu, Mingsian Bai, National Tsing Hua University, Taiwan | ||
Session | AUD-13: Detection and Classification of Acoustic Scenes and Events 2: Weak supervision | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpolation consistency training (ICT), shift consistency training (SCT), and weakly pseudo-labeling. In addition, we propose FP-CRNN, a convolutional recurrent neural network (CRNN) which contains feature-pyramid (FP) components, to leverage temporal information by utilizing features at different scales. Experiments were conducted on DCASE 2020 task 4. In terms of event-based F-measure, these approaches outperform the official baseline system, at 34.8%, with the highest Fmeasure of 48.0% achieved by an FP-CRNN that was trained with the combination of all three strategies. |