Paper ID | SPE-51.4 |
Paper Title |
NEURAL NOISE EMBEDDING FOR END-TO-END SPEECH ENHANCEMENT WITH CONDITIONAL LAYER NORMALIZATION |
Authors |
Zhihui Zhang, Xiaoqi Li, Yaxing Li, Yuanjie Dong, Dan Wang, Shengwu Xiong, Wuhan University of Technology, China |
Session | SPE-51: Speech Enhancement 7: Single-channel Processing |
Location | Gather.Town |
Session Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ENHA] Speech Enhancement and Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Most of the deep learning based speech enhancement methods focus on the modeling of complicated relationship between the noisy speech and the clean speech without the consideration of noise information. In order to cope with various complex noise scenes, we introduce a novel enhancement architecture that integrates a deep autoencoder with neural noise embedding. In this study, a new normalization method, termed conditional layer normalization (CLN), is introduced to improve the generalization of deep learning based speech enhancement approaches for unseen environments. The noise embedding is passed through the CLN layers to regularize the network for speech enhancement task. The proposed network can be adaptively adjusted according to different noise information extracted from the noisy speech input. The network in overall is trained in an end-to-end manner and the experimental results show that the proposed scheme produces satisfactory enhancement performance comparing the other methods. The visualization shows that our proposed network captures noise information, which is helpful to improve robustness to unseen environments for speech enhancement. |