Paper ID | AUD-13.4 | ||
Paper Title | THE BENEFIT OF TEMPORALLY-STRONG LABELS IN AUDIO EVENT CLASSIFICATION | ||
Authors | Shawn Hershey, Daniel P. W. Ellis, Eduardo Fonseca, Aren Jansen, Caroline Liu, R Channing Moore, Manoj Plakal, Google, United States | ||
Session | AUD-13: Detection and Classification of Acoustic Scenes and Events 2: Weak supervision | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-CLAS] Detection and Classification of Acoustic Scenes and Events | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | To reveal the importance of temporal precision in ground truth audio event labels, we collected precise (∼0.1 sec resolution) “strong” labels for a portion of the AudioSet dataset. We devised a temporallystrong evaluation set (including explicit negatives of varying difficulty) and a small strong-labeled training subset of 67k clips (compared to the original dataset’s 1.8M clips labeled at 10 sec resolution). We show that fine-tuning with a mix of weak- and stronglylabeled data can substantially improve classifier performance, even when evaluated using only the original weak labels. For a ResNet50 architecture, d' on the strong evaluation data including explicit negatives improves from 1.13 to 1.41. The new labels are available as an update to AudioSet. |