Paper ID | AUD-29.6 | ||
Paper Title | Supervised direct-path relative transfer function learning for binaural sound source localization | ||
Authors | Bing Yang, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University; Westlake University & Westlake Institute for Advanced Study, China; Xiaofei Li, Westlake University & Westlake Institute for Advanced Study, China; Hong Liu, Key Laboratory of Machine Perception, Shenzhen Graduate School, Peking University, China | ||
Session | AUD-29: Acoustic Sensor Array Processing 3: Acoustic Sensor Arrays | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 11:30 - 12:15 | ||
Presentation Time: | Friday, 11 June, 11:30 - 12:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-ASAP] Acoustic Sensor Array Processing | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Direct-path relative transfer function (DP-RTF) refers to the ratio between the direct-path acoustic transfer functions of two channels. Though DP-RTF fully encodes the sound directional cues and serves as a reliable localization feature, it is often erroneously estimated in the presence of noise and reverberation. This paper proposes a supervised DP-RTF learning method with deep neural networks for robust binaural sound source localization. To exploit the complementarity of single-channel spectrogram and dual-channel difference information, we first recover the direct-path magnitude spectrogram from the contaminated one using a monaural enhancement network, and then predict the DP-RTF from the dual-channel (enhanced-) intensity and phase cues using a binaural enhancement network. In addition, a weighted-matching softmax training loss is designed to promote the predicted DP-RTFs to be concentrated for the same direction and separated for different directions. Finally, the direction of arrival (DOA) of source is estimated by matching the predicted DP-RTF with the ground truths of candidate directions. Experimental results show the superiority of our method for DOA estimation in the environments with various levels of noise and reverberation. |