Paper ID | AUD-24.2 | ||
Paper Title | TEACHER-STUDENT LEARNING FOR LOW-LATENCY ONLINE SPEECH ENHANCEMENT USING WAVE-U-NET | ||
Authors | Sotaro Nakaoka, Li Li, Shota Inoue, Shoji Makino, University of Tsukuba, Japan | ||
Session | AUD-24: Signal Enhancement and Restoration 1: Deep Learning | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEN] Signal Enhancement and Restoration | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | This paper proposes a low-latency online extension of wave- U-net for single-channel speech enhancement, which utilizes teacher-student learning to reduce the system latency while keeping high enhancement performance. Wave-U-net is a recently proposed end-to-end source separation method, which achieved remarkable performance in singing voice separation and speech enhancement tasks. Since the enhancement is performed in the time domain, wave-U-net can efficiently model phase information and address the domain transformation limitation, where the time-frequency domain is normally adopted. Intending to apply wave-U-net to face-to-face applications such as hearing aids and in-car communication systems, where a strictly low-latency of less than 10 ms is required, in this paper, we investigate online versions of wave-U-net and propose using teacher-student learning to avoid the performance degradation caused by reducing input segmant length such that the system delay in a CPU is less than 10 ms. The experimental results revealed that the pro- posed model could perform in real-time and low-latency with a high performance of achieving a signal-to-distortion ratio improvement of about 8.35 dB. |