2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	SPE-35.1
Paper Title	ICASSP 2021 DEEP NOISE SUPPRESSION CHALLENGE
Authors	Chandan Karadagur Ananda Reddy, Harishchandra Dubey, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan, Microsoft, United States
Session	SPE-35: Speech Enhancement 5: DNS Challenge Task
Location	Gather.Town
Session Time:	Thursday, 10 June, 14:00 - 14:45
Presentation Time:	Thursday, 10 June, 14:00 - 14:45
Presentation	Poster
Topic	Speech Processing: [SPE-ENHA] Speech Enhancement and Separation
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020. We open-sourced training and test datasets for researchers to train their noise suppression models. We also open-sourced a subjective evaluation framework and used the tool to evaluate and pick the final winners. Many researchers from academia and industry made significant contributions to push the field forward. We also learned that as a research community, we still have a long way to go in achieving excellent speech quality in challenging noisy real-time conditions. In this challenge, we are expanding both our training and test datasets. Clean speech in the training set has increased by 200% with the addition of singing voice, emotion data, and non-English languages. The test set has increased by 100% with the addition of singing, emotional, nonEnglish (tonal and non-tonal) languages, and, personalized DNS test clips. There are two tracks with a focus on (i) real-time denoising, and (ii) real-time personalized DNS.