Paper ID | SPE-51.2 |
Paper Title |
SELF-ATTENTION GENERATIVE ADVERSARIAL NETWORK FOR SPEECH ENHANCEMENT |
Authors |
Huy Phan, Queen Mary University of London, United Kingdom; Huy Le Nguyen, Ho Chi Minh City University of Technology, Vietnam; Oliver Chén, University of Oxford, United Kingdom; Philipp Koch, University of Lübeck, Germany; Ngoc Q. K.\ Duong, InterDigital R&D France, France; Ian McLoughlin, Singapore Institute of Technology, Singapore; Alfred Mertins, University of Lübeck, Germany |
Session | SPE-51: Speech Enhancement 7: Single-channel Processing |
Location | Gather.Town |
Session Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation Time: | Friday, 11 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ENHA] Speech Enhancement and Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Existing generative adversarial networks (GANs) for speech enhancement solely rely on the convolution operation, which may obscure temporal dependencies across the sequence input. To remedy this issue, we propose a self-attention layer adapted from non-local attention, coupled with the convolutional and deconvolutional layers of a speech enhancement GAN (SEGAN) using raw signal input. Further, we empirically study the effect of placing the self-attention layer at the (de)convolutional layers with varying layer indices as well as at all of them when memory allows. Our experiments show that introducing self-attention to SEGAN leads to consistent improvement across the objective evaluation metrics of enhancement performance. Furthermore, applying at different (de)convolutional layers does not significantly alter performance, suggesting that it can be conveniently applied at the highest-level (de)convolutional layer with the smallest memory overhead. |