Paper ID | AUD-26.1 |
Paper Title |
SPEECH ENHANCEMENT WITH MIXTURE OF DEEP EXPERTS WITH CLEAN CLUSTERING PRE-TRAINING |
Authors |
Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot, Bar-Ilan University, Israel |
Session | AUD-26: Signal Enhancement and Restoration 3: Signal Enhancement |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this study we present a mixture of deep experts (MoDE) neural network architecture for single microphone speech enhancement. Our architecture comprises a set of deep neural networks (DNNs), each of which is an ‘expert’ in a different speech spectral pattern such as phoneme. A gating DNN is responsible for the latent variables which are the weights assigned to each expert’s output given a speech segment. The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts’ estimates, with the weights determined by the gating DNN. A soft spectral attenuation, based on the estimated mask, is then applied to enhance the noisy speech signal. As a byproduct, we gain reduction at the complexity in test time. We show that the experts specialization allows better robustness to unfamiliar noise types. |