2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDMLSP-46.1
Paper Title SAPAUGMENT: LEARNING A SAMPLE ADAPTIVE POLICY FOR DATA AUGMENTATION
Authors Ting-Yao Hu, Carnegie Mellon University, United States; Ashish Shrivastava, Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalinli, Oncel Tuzel, Apple, United States
SessionMLSP-46: Theory and Applications
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Machine Learning for Signal Processing: [MLR-MUSAP] Applications in music and audio processing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to learn the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also corrupt the annotation and make it too hard to learn from. Furthermore, a well classified sample (with low training loss) should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task and show substantial improvement, 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method.