Paper ID | SPE-27.1 |
Paper Title |
IMPROVING IDENTIFICATION OF SYSTEM-DIRECTED SPEECH UTTERANCES BY DEEP LEARNING OF ASR-BASED WORD EMBEDDINGS AND CONFIDENCE METRICS |
Authors |
Vilayphone Vilaysouk, Mila, Université de Montréal, Canada; Amr Nour-Eldin, Dermot Connolly, Nuance Communications, Canada |
Session | SPE-27: Speech Recognition 9: Confidence Measures |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-GASR] General Topics in Speech Recognition |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
In this paper, we extend our previous work on the detection of system-directed speech utterances. This type of binary classification can be used by virtual assistants to create a more natural and fluid interaction between the system and the user. We explore two methods that both improve the Equal-Error-Rate (EER) performance of the previous model. The first exploits the supplementary information independently captured by ASR models through integrating ASR decoder-based features as additional inputs to the final classification stage of the model. This relatively improves EER performance by 13%. The second proposed method further integrates word embeddings into the architecture and, when combined with the first method, achieves a significant EER performance improvement of 48%, relative to that of the baseline. |