Paper ID | MLSP-24.3 |
Paper Title |
Multi-Channel Speech Enhancement using Graph Neural Networks |
Authors |
Panagiotis Tzirakis, Anurag Kumar, Jacob Donley, Facebook, United States |
Session | MLSP-24: Applications in Audio and Speech Processing |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-SSEP] Source separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing each audio channel as a node lying in a non-Euclidean space and, specifically, a graph. This formulation allows us to apply graph neural networks (GNN) to find spatial correlations among the different channels (nodes). We utilize graph convolution networks (GCN) by incorporating them in the embedding space of a U-Net architecture. We use LibriSpeech dataset and simulate room acoustics data to extensively experiment with our approach using different array types, and number of microphones. Results indicate the superiority of our approach when compared to prior state-of-the-art method. |