2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDHLT-16.1
Paper Title Recent Advances in Arabic Syntactic Diacritics Restoration
Authors Yasser Hifny, University of Helwan, Egypt
SessionHLT-16: Applications in Natural Language
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Human Language Technology: [HLT-STPA] Segmentation, Tagging, and Parsing
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Restoring Arabic syntactic diacritics based on Long Short-Term Memory (LSTM) networks leads to state-of-the-art performance. These LSTM networks are commonly augmented with Maximum Entropy (MaxEnt) sparse direct connections between the input and the output layers of the tagger. One way to improve such tagger performance is to use an ensemble of taggers. However, an ensemble of taggers may require huge computational and memory resources. In this paper, we implement a knowledge distillation technique where an ensemble of teachers/taggers is used to train a single student tagger. On the other hand, Arabic is a morphologically rich language and has a high Out-Of-Vocabulary (OOV) rate. In addition to word embeddings, we propose to use character embeddings encoded using LSTMs for each word to overcome this problem. On the Arabic tree bank task, our hybrid LSTM/MaxEnt tagger achieves 1.0% absolute WER improvement over a strong baseline using the proposed two techniques.