2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-44.6
Paper Title THE ACCENTED ENGLISH SPEECH RECOGNITION CHALLENGE 2020: OPEN DATASETS, TRACKS, BASELINES, RESULTS AND METHODS
Authors Xian Shi, Fan Yu, Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University, China; Yizhou Lu, SpeechLab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China; Yuhao Liang, Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University, China; Qiangze Feng, Daliang Wang, Datatang (Beijing) Technology Co., LTD, China; Yanmin Qian, SpeechLab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China; Lei Xie, Audio, Speech and Language Processing Group (ASLP@NPU), School of Computer Science, Northwestern Polytechnical University, China
SessionSPE-44: Speech Recognition 16: Robust Speech Recognition 2
LocationGather.Town
Session Time:Thursday, 10 June, 16:30 - 17:15
Presentation Time:Thursday, 10 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-MULT] Multilingual Recognition and Identification
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract The variety of accents has posed a big challenge to speech recognition. The Accented English Speech Recognition Challenge (AESRC2020) is designed for providing a common testbed and promoting accent-related research. Two tracks are set in the challenge -- English accent recognition (track 1) and accented English speech recognition (track 2). A set of 160 hours of accented English speech collected from 8 countries is released with labels as the training set. Another 20 hours of speech without labels is later released as the test set, including two unseen accents from another two countries used to test the model generalization ability in track 2. We also provide baseline systems for the participants. This paper first reviews the released dataset, track setups, baselines and then summarizes the challenge results and major techniques used in the submissions.