2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDAUD-13.6
Paper Title SOUND EVENT DETECTION BY CONSISTENCY TRAINING AND PSEUDO-LABELING WITH FEATURE-PYRAMID CONVOLUTIONAL RECURRENT NEURAL NETWORKS
Authors Chih-Yuan Koh, You-Siang Chen, Yi-Wen Liu, Mingsian Bai, National Tsing Hua University, Taiwan
SessionAUD-13: Detection and Classification of Acoustic Scenes and Events 2: Weak supervision
LocationGather.Town
Session Time:Wednesday, 09 June, 15:30 - 16:15
Presentation Time:Wednesday, 09 June, 15:30 - 16:15
Presentation Poster
Topic Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpolation consistency training (ICT), shift consistency training (SCT), and weakly pseudo-labeling. In addition, we propose FP-CRNN, a convolutional recurrent neural network (CRNN) which contains feature-pyramid (FP) components, to leverage temporal information by utilizing features at different scales. Experiments were conducted on DCASE 2020 task 4. In terms of event-based F-measure, these approaches outperform the official baseline system, at 34.8%, with the highest Fmeasure of 48.0% achieved by an FP-CRNN that was trained with the combination of all three strategies.