Paper ID | AUD-8.3 |
Paper Title |
DBNET: DOA-DRIVEN BEAMFORMING NETWORK FOR END-TO-END REVERBERANT SOUND SOURCE SEPARATION |
Authors |
Ali Aroudi, University of Oldenburg, Germany; Sebastian Braun, Microsoft Corporation, United States |
Session | AUD-8: Audio and Speech Source Separation 4: Multi-Channel Source Separation |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 13:00 - 13:45 |
Presentation Time: | Wednesday, 09 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Many deep learning techniques are available to perform source separation and reduce background noise. However, designing an end-to-end multi-channel source separation method using deep learning and conventional acoustic signal processing techniques still remains challenging. In this paper we propose a direction-of-arrival-driven beamforming network (DBnet) consisting of direction-of-arrival (DOA) estimation and beamforming layers for end-to-end source separation. We propose to train DBnet using loss functions that are solely based on the distances between the separated speech signals and the target speech signals, without a need for the ground-truth DOAs of speakers. To improve the source separation performance, we also propose end-to-end extensions of DBnet which incorporate post masking networks. We evaluate the proposed DBnet and its extensions on a very challenging dataset, targeting realistic far-field sound source separation in reverberant and noisy environments. The experimental results show that the proposed extended DBnet using a convolutional-recurrent post masking network outperforms state-of-the-art source separation methods. |