Paper ID | SPE-30.3 | ||
Paper Title | GENERATIVE SPEECH CODING WITH PREDICTIVE VARIANCE REGULARIZATION | ||
Authors | W Bastiaan Kleijn, Victoria University of Wellington, New Zealand; Andrew Storus, Michael Chinen, Tom Denton, Felicia S. C. Lim, Alejandro Luebs, Jan Skoglund, Hengchin Yeh, Google, United States | ||
Session | SPE-30: Speech Processing 2: General Topics | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
Presentation | Poster | ||
Topic | Speech Processing: [SPE-CODI] Speech Coding | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | The recent emergence of machine-learning based generative models for speech suggests a significant reduction in bit rate for speech codecs is possible. However, the performance of generative models deteriorates significantly with the distortions present in real-world input signals. We argue that this deterioration is due to the sensitivity of the maximum likelihood criterion to outliers and the ineffectiveness of modeling a sum of independent signals with a single autoregressive model. We introduce predictive-variance regularization to reduce the sensitivity to outliers, resulting in a significant increase in performance. We show that noise reduction to remove unwanted signals can significantly increase performance. We provide extensive subjective performance evaluations that show that our system based on generative modeling provides state-of-the-art coding performance at 3 kb/s for real-world speech signals at reasonable computational complexity |