Paper ID | SPE-41.4 | ||
Paper Title | PREVENTING EARLY ENDPOINTING FOR ONLINE AUTOMATIC SPEECH RECOGNITION | ||
Authors | Yingzhu Zhao, Nanyang Technological University, Singapore; Chongjia Ni, Cheung-Chi Leung, Alibaba Group, Singapore; Shafiq Joty, Eng Siong Chng, Nanyang Technological University, Singapore; Bin Ma, Alibaba Group, Singapore | ||
Session | SPE-41: Voice Activity and Disfluency Detection | ||
Location | Gather.Town | ||
Session Time: | Thursday, 10 June, 15:30 - 16:15 | ||
Presentation Time: | Thursday, 10 June, 15:30 - 16:15 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-VAD] Voice Activity Detection and End-pointing | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | With the recent development of end-to-end models in speech recognition, there have been more interests in adapting these models for online speech recognition. However, using end-to-end models for online speech recognition is known to suffer from an early endpointing problem, which brings in many deletion errors. In this paper, we propose to address the early endpointing problem from the gradient perspective. Specifically, we leverage on the recently proposed ScaleGrad technique, which was proposed to mitigate the text degeneration issue. Different from ScaleGrad, we adapt it to discourage the early generation of the end-of-sentence ( |