2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Technical Program

Paper ID	HLT-7.2
Paper Title	CASCADED MODELS WITH CYCLIC FEEDBACK FOR DIRECT SPEECH TRANSLATION
Authors	Tsz Kin Lam, Shigehiko Schamoni, Stefan Riezler, Heidelberg University, Germany
Session	HLT-7: Speech Translation 1: Models
Location	Gather.Town
Session Time:	Wednesday, 09 June, 14:00 - 14:45
Presentation Time:	Wednesday, 09 June, 14:00 - 14:45
Presentation	Poster
Topic	Human Language Technology: [HLT-MTSW] Machine Translation for Spoken and Written Language
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Direct speech translation describes a scenario where only speech inputs and corresponding translations are available. Such data are notoriously limited. We present a technique that allows cascades of automatic speech recognition (ASR) and machine translation (MT) to exploit in-domain direct speech translation data in addition to out-of-domain MT and ASR data. After pre-training MT and ASR, we use a feedback cycle where the downstream performance of the MT system is used as a signal to improve the ASR system by self-training, and the MT component is fine-tuned on multiple ASR outputs, making it more tolerant towards spelling variations. A comparison to end-to-end speech translation using components of identical architecture and the same data shows gains of up to 3.8 BLEU points on LibriVoxDeEn and up to 5.1 BLEU points on CoVoST for German-to-English speech translation.