Paper ID | AUD-2.1 |
Paper Title |
SEMI-SUPERVISED SINGING VOICE SEPARATION WITH NOISY SELF-TRAINING |
Authors |
Zhepei Wang, University of Illinois at Urbana-Champaign, United States; Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy, Amazon Web Services, United States |
Session | AUD-2: Audio and Speech Source Separation 2: Music and Singing Voice Separation |
Location | Gather.Town |
Session Time: | Tuesday, 08 June, 13:00 - 13:45 |
Presentation Time: | Tuesday, 08 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of ground-truth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model's performance. Following the noisy self-training framework, we first train a teacher network on the small labeled dataset and infer pseudo-labels from the large corpus of unlabeled mixtures. Then, a larger student network is trained on combined ground-truth and self-labeled datasets. Empirical results show that the proposed self-training scheme, along with data augmentation methods, effectively leverage the large unlabeled corpus and obtain superior performance compared to supervised methods. |