Paper ID | MLSP-40.5 | ||
Paper Title | SEQ-CPC : SEQUENTIAL CONTRASTIVE PREDICTIVE CODING FOR AUTOMATIC SPEECH RECOGNITION | ||
Authors | Yulong Chen, Jianping Zhao, Weiqi Wang, Ming Fang, Haimei Kang, Lu Wang, Tao Wei, Jun Ma, Shaojun Wang, Jing Xiao, Ping An Technology, China | ||
Session | MLSP-40: Contrastive Learning | ||
Location | Gather.Town | ||
Session Time: | Friday, 11 June, 11:30 - 12:15 | ||
Presentation Time: | Friday, 11 June, 11:30 - 12:15 | ||
Presentation | Poster | ||
Topic | Machine Learning for Signal Processing: [MLR-SSUP] Self-supervised and semi-supervised learning | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Inspired by the contrastive predictive coding (CPC), we propose a feature representation scheme for automatic speech recognition (ASR), which encodes sequential dependency information from raw audio signals. Following the original CPC, for a given frame, mutual information (MI) lower bound is maximized between historical context and future prediction. While computing the MI lower bound, based on original CPC, we develop the sequential CPC (SEQ-CPC), which takes the sequential information between frames into consideration. Since speech frames are not independent events, incorporating sequential information leads to better recognition performance. Experimental results on WSJ corpus show that SEQ-CPC achieves the best performance than CPC and NCE which is the contrastive objective used in wav2vec. |