Paper ID | AUD-26.1 | ||
Paper Title | SPEECH ENHANCEMENT WITH MIXTURE OF DEEP EXPERTS WITH CLEAN CLUSTERING PRE-TRAINING | ||
Authors | Shlomo E. Chazan, Jacob Goldberger, Sharon Gannot, Bar-Ilan University, Israel | ||
Session | AUD-26: Signal Enhancement and Restoration 3: Signal Enhancement | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In this study we present a mixture of deep experts (MoDE) neural network architecture for single microphone speech enhancement. Our architecture comprises a set of deep neural networks (DNNs), each of which is an ‘expert’ in a different speech spectral pattern such as phoneme. A gating DNN is responsible for the latent variables which are the weights assigned to each expert’s output given a speech segment. The experts estimate a mask from the noisy input and the final mask is then obtained as a weighted average of the experts’ estimates, with the weights determined by the gating DNN. A soft spectral attenuation, based on the estimated mask, is then applied to enhance the noisy speech signal. As a byproduct, we gain reduction at the complexity in test time. We show that the experts specialization allows better robustness to unfamiliar noise types. |