Paper ID | MLSP-23.6 | ||
Paper Title | MICAUGMENT: ONE-SHOT MICROPHONE STYLE TRANSFER | ||
Authors | Zalán Borsos, ETH Zurich, Switzerland; Yunpeng Li, Beat Gfeller, Marco Tagliasacchi, Google, Switzerland | ||
Session | MLSP-23: Applications in Music and Audio Processing | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Machine Learning for Signal Processing: [MLR-MUSAP] Applications in music and audio processing | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | A crucial aspect for the successful deployment of audio-based models "in-the-wild" is the robustness to the transformations introduced by heterogeneous acquisition conditions. In this work, we propose a method to perform one-shot microphone style transfer. Given only a few seconds of audio recorded by a target device, MicAugment identifies the transformations associated to the input acquisition pipeline and uses the learned transformations to synthesize audio as if it were recorded under the same conditions as the target audio. We show that our method can successfully apply the style transfer to real audio and that it significantly increases model robustness when used as data augmentation in the downstream tasks. |