2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-51.4
Paper Title NEURAL NOISE EMBEDDING FOR END-TO-END SPEECH ENHANCEMENT WITH CONDITIONAL LAYER NORMALIZATION
Authors Zhihui Zhang, Xiaoqi Li, Yaxing Li, Yuanjie Dong, Dan Wang, Shengwu Xiong, Wuhan University of Technology, China
SessionSPE-51: Speech Enhancement 7: Single-channel Processing
LocationGather.Town
Session Time:Friday, 11 June, 13:00 - 13:45
Presentation Time:Friday, 11 June, 13:00 - 13:45
Presentation Poster
Topic Speech Processing: [SPE-ENHA] Speech Enhancement and Separation
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Most of the deep learning based speech enhancement methods focus on the modeling of complicated relationship between the noisy speech and the clean speech without the consideration of noise information. In order to cope with various complex noise scenes, we introduce a novel enhancement architecture that integrates a deep autoencoder with neural noise embedding. In this study, a new normalization method, termed conditional layer normalization (CLN), is introduced to improve the generalization of deep learning based speech enhancement approaches for unseen environments. The noise embedding is passed through the CLN layers to regularize the network for speech enhancement task. The proposed network can be adaptively adjusted according to different noise information extracted from the noisy speech input. The network in overall is trained in an end-to-end manner and the experimental results show that the proposed scheme produces satisfactory enhancement performance comparing the other methods. The visualization shows that our proposed network captures noise information, which is helpful to improve robustness to unseen environments for speech enhancement.