2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-37.5
Paper Title CAM: Context-Aware Masking for Robust Speaker Verification
Authors Ya-Qi Yu, Nanjing University, China; Siqi Zheng, Hongbin Suo, Yun Lei, Alibaba Group, China; Wu-Jun Li, Nanjing University, China
SessionSPE-37: Speaker Recognition 5: Neural Embedding
LocationGather.Town
Session Time:Thursday, 10 June, 14:00 - 14:45
Presentation Time:Thursday, 10 June, 14:00 - 14:45
Presentation Poster
Topic Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Performance degradation caused by noise has been a long-standing challenge for speaker verification. Previous methods usually involve applying a denoising transformation to speaker embeddings or enhancing input features. Nevertheless, these methods are lossy and inefficient for speaker embedding. In this paper, we propose context-aware masking (CAM), a novel method to extract robust speaker embedding. CAM enables the speaker embedding network to "focus" on the speaker of interest and "blur" unrelated noise. The threshold of masking is dynamically controlled by an auxiliary context embedding that captures speaker and noise characteristics. Moreover, models adopting CAM can be trained in an end-to-end manner without using synthesized noisy-clean speech pairs. Our results show that CAM improves speaker verification performance in the wild by a large margin, compared to the baselines.