Paper ID | SPE-6.6 | ||
Paper Title | WEIGHTED MAGNITUDE-PHASE LOSS FOR SPEECH DEREVERBERATION | ||
Authors | Jingshu Zhang, Mark Plumbley, Wenwu Wang, University of Surrey, United Kingdom | ||
Session | SPE-6: Speech Enhancement 2: Speech Separation and Dereverberation | ||
Location | Gather.Town | ||
Session Time: | Tuesday, 08 June, 14:00 - 14:45 | ||
Presentation Time: | Tuesday, 08 June, 14:00 - 14:45 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-ENHA] Speech Enhancement and Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In real rooms, recorded speech usually contains reverberation, which degrades the quality and intelligibility of the speech. It has proven effective to use neural networks to estimate complex ideal ratio masks (cIRMs) using mean square error (MSE) loss for speech dereverberation. However, in some cases, when using MSE loss to estimate complex-valued masks, phase may have a disproportionate effect compared to magnitude. We propose a new weighted magnitude-phase loss function, which is divided into a magnitude component and a phase component, to train a neural network to estimate complex ideal ratio masks. A weight parameter is introduced to adjust the relative contribution of magnitude and phase to the overall loss. We find that our proposed loss function outperforms the regular MSE loss function for speech dereverberation. |