2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-16.5
Paper Title Joint Alignment Learning-Attention based Model for Grapheme-to-Phoneme Conversion
Authors Yonghe Wang, Feilong Bao, Hui Zhang, Guanglai Gao, Inner Mongolia University, China
SessionHLT-16: Applications in Natural Language
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-GASR] General Topics in Speech Recognition
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Sequence-to-sequence attention-based models for grapheme-to-phoneme (G2P) conversion have gained significant interests. The attention-based encoder-decoder framework learns the mapping of input to output tokens by selectively focusing on relevant information, and has been shown well performance. However, the attention mechanism can result in non-monotonic alignments, resulting in poor G2P conversion performance. In this paper, we present a novel approach to optimize the G2P conversion model directly alignment grapheme-phoneme sequence by using alignment learning (AL) as the loss function. Besides, we propose a multi-task learning method that uses a joint alignment learning model and attention model to predict the proper alignments and thus improve the accuracy of G2P conversion. Evaluations on Mongolian and CMUDict tasks show that alignment learning as the loss function can effectively train G2P conversion model. Further, our multi-task method can significantly outperform both the alignment learning-based model and attention-based model.