Paper ID | SPE-57.3 |
Paper Title |
AUTOMATIC ELICITATION COMPLIANCE FOR SHORT-DURATION SPEECH BASED DEPRESSION DETECTION |
Authors |
Brian Stasak, Zhaocheng Huang, University of New South Wales, Australia; Dale Joachim, Sonde Health, United States; Julien Epps, University of New South Wales, Australia |
Session | SPE-57: Speech, Depression and Sleepiness |
Location | Gather.Town |
Session Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation Time: | Friday, 11 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ANLS] Speech Analysis |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Detecting depression from the voice in naturalistic environments is challenging, particularly for short-duration audio recordings. This enhances the need to interpret and make optimal use of elicited speech. The rapid consonant-vowel syllable combination ‘pataka’ has frequently been selected as a clinical motor-speech task. However, there is significant variability in elicited recordings, which remains to be investigated. In this multi-corpus study of over 25,000 ‘pataka’ utterances, it was discovered that speech landmark-based features were sensitive to the number of ‘pataka’ utterances per recording. This landmark feature sensitivity was newly exploited to automatically estimate ‘pataka’ count and rate, achieving root mean square errors nearly three times lower than chance-level. Leveraging count-rate knowledge of the elicited speech for depression detection, results show that the estimated ‘pataka’ number and rate are important for normalizing evaluative ‘pataka’ speech data. Count and/or rate normalized ‘pataka’ models produced relative reductions in depression classification error of up to 26% compared with non-normalized models. |