IEEE ICASSP 2021 || Toronto, Ontario, Canada || 6-11 June 2021

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.

Create a login based on your email (takes less than one minute)
Perform 'Paper Search'
Select papers that you desire to save in your personalized schedule
Click on 'My Schedule' to see the current list of selected papers
Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper ID

SPE-32.2

Paper Title

A FURTHER STUDY OF UNSUPERVISED PRETRAINING FOR TRANSFORMER BASED SPEECH RECOGNITION

Authors

Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Kun Han, Xiangang Li, Didi Chuxing, China

Session

SPE-32: Speech Recognition 12: Self-supervised, Semi-supervised, Unsupervised Training

Location

Gather.Town

Session Time:

Thursday, 10 June, 13:00 - 13:45

Presentation Time:

Thursday, 10 June, 13:00 - 13:45

Presentation

Poster

Topic

Speech Processing: [SPE-GASR] General Topics in Speech Recognition

IEEE Xplore Open Preview

Click here to view in IEEE Xplore

Abstract

The construction of an effective good speech recognition system typically requires large amounts of transcribed data, which is expensive to collect. To overcome this problem, many unsupervised pretraining methods have been proposed. Among these methods, Masked Predictive Coding achieved significant improvements on various speech recognition datasets with BERT-like Masked Reconstruction loss and transformer backbone. However, many aspects of MPC have yet to be fully investigated. In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pretraining data speaking style, its extension on streaming model, and strategies for better transferring learned knowledge from pretraining stage to downstream tasks. The experimental results demonstrated that pretraining data with a matching speaking style is more useful on downstream recognition tasks. A unified training objective with APC and MPC provided an 8.46% relative error reduction on the streaming model trained on HKUST. Additionally, the combination of target data adaption and layerwise discriminative training facilitated the knowledge transfer of MPC, which realized 3.99% relative error reduction on AISHELL over a strong baseline.

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

My ICASSP 2021 Schedule

Paper Detail