Paper ID | HLT-11.1 |
Paper Title |
ASR n-best Fusion Nets |
Authors |
Xinyue Liu, Mingda Li, Luoxin Chen, Prashan Wanigasekara, Weitong Ruan, Haidar Khan, Wael Hamza, Chengwei Su, Amazon, United States |
Session | HLT-11: Language Understanding 3: Speech Understanding - General Topics |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 13:00 - 13:45 |
Presentation Time: | Thursday, 10 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Human Language Technology: [HLT-UNDE] Spoken Language Understanding and Computational Semantics |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Current spoken language understanding systems heavily rely on the best hypothesis (ASR 1-best) generated by automatic speech recognition, which is used as the input for downstream models such as natural language understanding (NLU) modules. However, the potential errors and misrecognition in ASR 1-best raise challenges to NLU. It is usually difficult for NLU models to recover from ASR errors without additional signals, which leads to suboptimal SLU performance.This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors.Our experiments on Alexa data show that our model achieved 21.71% error reduction compared to baseline trained on transcription for domain classification. |