Paper ID | AUD-14.4 | ||
Paper Title | NON-INTRUSIVE BINAURAL PREDICTION OF SPEECH INTELLIGIBILITY BASED ON PHONEME CLASSIFICATION | ||
Authors | Jana Roßbach, Communication Acoustics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany; Saskia Röttges, Christopher F. Hauth, Thomas Brand, Medical Physics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany; Bernd T. Meyer, Communication Acoustics and Cluster of Excellence Hearing4All, Carl-von-Ossietzky University Oldenburg, Germany | ||
Session | AUD-14: Quality and Intelligibility Measures | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation Time: | Wednesday, 09 June, 15:30 - 16:15 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-QIM] Quality and Intelligibility Measures | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In this study, we explore an approach for modeling speech intelligibility in spatial acoustic scenes. To this end, we combine a non-intrusive binaural frontend with a deep neural network (DNN) borrowed from a standard automatic speech recognition (ASR) system. The DNN estimates phoneme probabilities that degrade in the presence of noise and reverberation, which is quantified with an entropy-based measure. The model output is used to predict speech recognition thresholds, i.e., signal-to-noise ratio with 50\% word recognition accuracy. It is compared to measured data obtained from eight normal-hearing listeners in acoustic scenarios with varying positions of localized maskers, different rooms and reverberation times. The model is non-intrusive; yet it produces a root mean squared error in the range of 0.6-2.1\,dB, which is similar to results obtained with a reference model (0.3-1.8\,dB) that uses oracle knowledge both in the frontend and in the backend stage. |