Paper ID | SPE-52.6 | ||
Paper Title | CASCADED TIME + TIME-FREQUENCY UNET FOR SPEECH ENHANCEMENT: JOINTLY ADDRESSING CLIPPING, CODEC DISTORTIONS, AND GAPS | ||
Authors | Arun Asokan Nair, Johns Hopkins University, United States; Kazuhito Koishida, Microsoft Corporation, United States | ||
Session | SPE-52: Speech Enhancement 8: Echo Cancellation and Other Tasks | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation Time: | Friday, 11 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-ENHA] Speech Enhancement and Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Speech enhancement aims to improve speech quality by eliminating noise and distortions. While most speech enhancement methods address signal independent additive sources of noise, several degradations to speech signals are signal dependent and non-additive, like speech clipping, codec distortions, and gaps in speech. In this work, we first systematically study and achieve state of the art results on each of these three distortions individually. Next, we demonstrate a neural network pipeline that cascades a time domain convolutional neural network with a time-frequency domain convolutional neural network to address all three distortions jointly. We observe that such a cascade achieves good performance while having the added benefit of keeping the action of each neural network component interpretable. |