2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-11.1
Paper Title ASR n-best Fusion Nets
Authors Xinyue Liu, Mingda Li, Luoxin Chen, Prashan Wanigasekara, Weitong Ruan, Haidar Khan, Wael Hamza, Chengwei Su, Amazon, United States
SessionHLT-11: Language Understanding 3: Speech Understanding - General Topics
LocationGather.Town
Session Time:Thursday, 10 June, 13:00 - 13:45
Presentation Time:Thursday, 10 June, 13:00 - 13:45
Presentation Poster
Topic Human Language Technology: [HLT-UNDE] Spoken Language Understanding and Computational Semantics
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Current spoken language understanding systems heavily rely on the best hypothesis (ASR 1-best) generated by automatic speech recognition, which is used as the input for downstream models such as natural language understanding (NLU) modules. However, the potential errors and misrecognition in ASR 1-best raise challenges to NLU. It is usually difficult for NLU models to recover from ASR errors without additional signals, which leads to suboptimal SLU performance.This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors.Our experiments on Alexa data show that our model achieved 21.71% error reduction compared to baseline trained on transcription for domain classification.