2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-48.2
Paper Title END-TO-END MULTILINGUAL AUTOMATIC SPEECH RECOGNITION FOR LESS-RESOURCED LANGUAGES: THE CASE OF FOUR ETHIOPIAN LANGUAGES
Authors Solomon Teferra Abate, Martha Yifiru Tachbelie, Addis Ababa University, Ethiopia; Tanja Schultz, University of Bremen, Germany
SessionSPE-48: Speech Recognition 18: Low Resource ASR
LocationGather.Town
Session Time:Friday, 11 June, 11:30 - 12:15
Presentation Time:Friday, 11 June, 11:30 - 12:15
Presentation Poster
Topic Speech Processing: [SPE-GASR] General Topics in Speech Recognition
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract End-to-End (E2E) approach, which maps a sequence of input features into a sequence of grapheme or words, to Automatic Speech Recognition (ASR) is a hot research agenda. It is interesting for less-resourced languages since it avoids the use of pronunciation dictionary, which is one of the major components in the traditional ASR systems. However, like any deep neural network (DNN) approaches, E2E is data greedy. This makes the application of E2E to less-resourced languages questionable. However, using data from other languages in a multilingual (ML) setup is being applied to solve the problem of data scarcity. We have, therefore, conducted ML E2E ASR experiments for four less-resourced Ethiopian languages using different language and acoustic modelling units. The results of our experiments show that relative Word Error Rate (WER) reductions (over the monolingual E2E systems) of up to 29.83% can be achieved by just using data of two related languages in E2E ASR system training. Moreover, we have also noticed that the use of data from less related languages also leads to E2E ASR performance improvement over the use of monolingual data.