Paper ID | SPE-57.5 |
Paper Title |
PAUSE-ENCODED LANGUAGE MODELS FOR RECOGNITION OF ALZHEIMER'S DISEASE AND EMOTION |
Authors |
Jiahong Yuan, Xingyu Cai, Kenneth Church, Baidu Research, USA, United States |
Session | SPE-57: Speech, Depression and Sleepiness |
Location | Gather.Town |
Session Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
We propose enhancing Transformer language models (BERT, RoBERTa) to take advantage of pauses. Pauses play an important role in speech. In previous work we developed a method to encode pauses in transcripts for recognition of Alzheimer’s disease. In this study, we extend this idea to language models. We re-train BERT and RoBERTa using a large collection of pause-encoded transcripts, and conduct fine-tuning for two downstream tasks, recognition of Alzheimer’s disease and emotion. Pause-encoded language models outperform text-only language models on these tasks. Pause augmentation by duration perturbation for training is shown to improve pause-encoded language models. |