2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-29.4
Paper Title A Two-Stage Deep Modeling Approach to Articulatory Inversion
Authors Abdolreza Sabzi Shahrebabaki, Negar Olfati, Ali Shariq Imran, Magne Hallstein Johnsen, Norwegian University of Science and Technology, Norway; Sabato Marco Siniscalchi, Kore University of Enna, Italy; Torbjørn Karl Svendsen, Norwegian University of Science and Technology, Norway
SessionSPE-29: Speech Processing 1: Production
LocationGather.Town
Session Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-SPRD] Speech Production
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract This paper proposes a two-stage deep feed-forward neural network (DNN) to tackle the acoustic-to-articulatory inversion (AAI) problem. DNNs are a viable solution for the AAI task, but the temporal continuity of the estimated articulatory values has not been exploited properly when a DNN is employed. In this work, we propose to address the lack of any temporal constraints while enforcing a parameter-parsimonious solution by deploying a two-stage solution based only on DNNs: (i) Articulatory trajectories are estimated in a first stage using DNN, and (ii) a temporal window of the estimated trajectories is used in a follow-up DNN stage as a refinement. The first stage estimation could be thought of as an auxiliary additional information that poses some constraints on the inversion process. Experimental evidence demonstrates an average error reduction of 7.51% in terms of RMSE compared to the baseline, and an improvement of 2.39% with respect to Pearson correlation is also attained. Finally, we should point out that AAI is still a highly challenging problem, mainly due to the non-linearity of the acoustic-to-articulatory and one-to-many mapping. It is thus promising that a significant improvement was attained with our simple yet elegant solution.