Paper ID | HLT-15.3 |
Paper Title |
IMPROVING PRONUNCIATION ASSESSMENT VIA ORDINAL REGRESSION WITH ANCHORED REFERENCE SAMPLES |
Authors |
Bin Su, Tsinghua University, China; Shaoguang Mao, Frank K. Soong, Yan Xia, Jonathan Tien, Microsoft Research Asia, China; Zhiyong Wu, Tsinghua University, China |
Session | HLT-15: Language Assessment |
Location | Gather.Town |
Session Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation Time: | Thursday, 10 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Human Language Technology: [HLT-LACL] Language Acquisition and Learning |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Sentence level pronunciation assessment is important for Computer Assisted Language Learning (CALL). Traditional speech pronunciation assessment, based on the Goodness of Pronunciation (GOP) algorithm, has some weakness in assessing a speech utterance: 1) Phoneme GOP scores cannot be easily translated into a sentence score with a simple average for effective assessment; 2) The rank ordering information has not been well exploited in GOP scoring for delivering a robust assessment and correlate well with a human rater’s evaluations. In this paper, we propose two new statistical features, average GOP (aGOP) and confusion GOP (cGOP) and use them to train a binary classifier in Ordinal Regression with Anchored Reference Samples (ORARS). When the proposed approach is tested on Microsoft mTutor ESL Dataset, a relative improvement of Pearson correlation coefficient of 26.9% is obtained over the conventional GOP-based one. The performance is at a human-parity level or better than human raters. |