Paper ID | AUD-24.2 |
Paper Title |
TEACHER-STUDENT LEARNING FOR LOW-LATENCY ONLINE SPEECH ENHANCEMENT USING WAVE-U-NET |
Authors |
Sotaro Nakaoka, Li Li, Shota Inoue, Shoji Makino, University of Tsukuba, Japan |
Session | AUD-24: Signal Enhancement and Restoration 1: Deep Learning |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
This paper proposes a low-latency online extension of wave- U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping high enhancement performance. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. Intending to apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required, in this paper, we investigate online versions of wave-U-net and propose using teacher-student learning to avoid the performance degradation caused by reducing input segmant length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the pro- posed model could perform in real-time and low-latency with a high performance of achieving a signal-to-distortion ratio improvement of about 8.35 dB. |