Paper ID | SPE-56.3 | ||
Paper Title | AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR | ||
Authors | Qiang Gao, Haiwei Wu, Yanqing Sun, Yitao Duan, NetEase Youdao, China | ||
Session | SPE-56: Paralinguistics in Speech | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation Time: | Friday, 11 June, 14:00 - 14:45 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-ANLS] Speech Analysis | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | This paper proposes a novel accent recognition system in the framework of a transformer-based end-to-end speech recognition system. To incorporate the pronunciation and linguistic knowledge into the network, we first pre-train an ASR model in a hybrid CTC/attention manner. Then, focusing on accent recognition, we extend the output token list by inserting accent labels to the transcripts and finetune the network parameters with an accented speech dataset. Our work is evaluated on the Interspeech 2020 Accented English Speech Recognition Challenge. Experiments show that our method achieves an accuracy of 72.39% on the test set and 80.98% on the development set, outperforming the baseline system by a very large margin. Our submitted system ranked second in the accent recognition task in the challenge. |