Paper ID | SPE-20.3 |
Paper Title |
ASV-SUBTOOLS: OPEN SOURCE TOOLKIT FOR AUTOMATIC SPEAKER VERIFICATION |
Authors |
Fuchuan Tong, Miao Zhao, Jianfeng Zhou, Hao Lu, Zheng Li, Lin Li, Qingyang Hong, Xiamen University, China |
Session | SPE-20: Speaker Recognition 4: Applications |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-SPKR] Speaker Recognition and Characterization |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this paper, we introduce a new open source toolkit for automatic speaker verification (ASV), named ASV-Subtools. Adopting PyTorch as main deep learning engine and Kaldi toolkit for data processing, ASV-Subtools allows users to develop modern speaker recognizers flexibly and efficiently. The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting the state-of-the-art approaches in speaker recognition. In addition to including the commonly used networks, such as the time delay neural networks (TDNN), factorized TDNN (F-TDNN) and ResNet, ASV-Subtools also integrates an upgraded version of SpecAugment data augmentation method, named Inverted SpecAugment, with focus on making it more appropriate for speaker recognition subtasks. Besides, for alleviating the domain mismatch between training and test data, ASV-Subtools provides multiple domain adaptation methods of Probabilistic Linear Discriminant Analysis (PLDA). Experimental results show that state-of-the-art techniques implemented on ASV-Subtools could achieve competitive performance compared to other implementations. |