Paper ID | SPE-5.2 | ||
Paper Title | TIME-DOMAIN LOSS MODULATION BASED ON OVERLAP RATIO FOR MONAURAL CONVERSATIONAL SPEAKER SEPARATION | ||
Authors | Hassan Taherian, DeLiang Wang, The Ohio State Universtiy, United States | ||
Session | SPE-5: Speech Enhancement 1: Speech Separation | ||
Location | Gather.Town | ||
Session Time: | Tuesday, 08 June, 14:00 - 14:45 | ||
Presentation Time: | Tuesday, 08 June, 14:00 - 14:45 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-ENHA] Speech Enhancement and Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Existing speaker separation methods deliver excellent performance on fully overlapped signal mixtures. To apply these methods in daily conversations that include occasional concurrent speakers, recent studies incorporate both overlapped and non-overlapped segments in the training data. However, such training data can degrade the separation performance due to triviality of non-overlapped segments where the model reflects the input to the output. We propose a new loss function for speaker separation based on permutation invariant training that dynamically reweighs losses using the segment overlap ratio. The new loss function emphasizes overlapped regions while deemphasizing the segments with single speakers. We demonstrate the effectiveness of the proposed loss function on an automatic speech recognition (ASR) task. Experiments on the recently introduced LibriCSS corpus show that our proposed single-channel method produces consistent improvements compared to baseline methods. |