Paper ID | AUD-24.3 |
Paper Title |
LEARNING DISENTANGLED FEATURE REPRESENTATIONS FOR SPEECH ENHANCEMENT VIA ADVERSARIAL TRAINING |
Authors |
Nana Hou, Nanyang Technological University, Singapore; Chenglin Xu, National University of Singapore, Singapore; Eng Siong Chng, Nanyang Technological University, Singapore; Haizhou Li, National University of Singapore, Singapore |
Session | AUD-24: Signal Enhancement and Restoration 1: Deep Learning |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Deep learning based speech enhancement degrades significantly in face of unseen noise. To address such mismatch, in this work, we propose to learn noise-agnostic feature representations by disentanglement learning, which removes the unspecified noise factor, while keeping the specified factors of variation associated with the clean speech. Specifically, a discriminator module is introduced to distinguish the type of noises, which is referred to as the disentangler. With the adversarial training strategy, a gradient reversal layer seeks to disentangle the noise factor and remove it from the feature representation. Experiment results show that the proposed approach achieves 5.8% and 5.2% relative improvements over the best baseline in terms of perceptual evaluation of the speech quality (PESQ) and segmental signal-to-noise ratio (SSNR), respectively. Furthermore, the ablation study indicates that the proposed disentangler module is also effective in the other encoder-decoder-like structure. The scripts are available at Github. |