Paper ID | AUD-1.4 | ||
Paper Title | ULTRA-LIGHTWEIGHT SPEECH SEPARATION VIA GROUP COMMUNICATION | ||
Authors | Yi Luo, Cong Han, Nima Mesgarani, Columbia University, United States | ||
Session | AUD-1: Audio and Speech Source Separation 1: Speech Separation | ||
Location | Gather.Town | ||
Session Time: | Tuesday, 08 June, 13:00 - 13:45 | ||
Presentation Time: | Tuesday, 08 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Model size and complexity remain the biggest challenges in the deployment of speech enhancement and separation systems on low-resource devices such as earphones and hearing aids. Although methods such as compression, distillation and quantization can be applied to large models, they often come with a cost on the model performance. In this paper, we provide a simple model design paradigm that explicitly designs ultra-lightweight models without sacrificing the performance. Motivated by the sub-band frequency-LSTM (F-LSTM) architectures, we introduce the group communication (GroupComm), where a feature vector is split into smaller groups and a small processing block is used to perform inter-group communication. Unlike standard F-LSTM models where the sub-band outputs are concatenated, an ultra-small module is applied on all the groups in parallel, which allows a significant decrease on the model size. Experiment results show that comparing with a strong baseline model which is already lightweight, GroupComm can achieve on par performance with 35.6 times fewer parameters and 2.3 times fewer operations. |