Paper ID | AUD-30.6 |
Paper Title |
SLOW-FAST AUDITORY STREAMS FOR AUDIO RECOGNITION |
Authors |
Evangelos Kazakos, University of Bristol, United Kingdom; Arsha Nagrani, Andrew Zisserman, University of Oxford, United Kingdom; Dima Damen, University of Bristol, United Kingdom |
Session | AUD-30: Detection and Classification of Acoustic Scenes and Events 5: Scenes |
Location | Gather.Town |
Session Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
We propose a two-stream convolutional network for audio recognition, that operates on time-frequency spectrogram inputs. Following similar success in visual recognition, we learn Slow-Fast auditory streams with separable convolutions and multi-level lateral connections. The Slow pathway has high channel capacity while the Fast pathway operates at a fine-grained temporal resolution. We showcase the importance of our two-stream proposal on two diverse datasets: VGG-Sound and EPIC-KITCHENS-100, and achieve state-of-the-art results on both. |