2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Click on the icon to view the manuscript on IEEE XPlore in the IEEE ICASSP 2021 Open Preview.

SPE-36: Speech Enhancement 6: Multi-modal Processing

Session Type: Poster
Time: Thursday, 10 June, 14:00 - 14:45
Location: Gather.Town
Virtual Session: View on Virtual Platform
Session Chair: Chandan K A Reddy, Microsoft
 
 SPE-36.1: AUDIO-VISUAL SPEECH INPAINTING WITH DEEP LEARNING
         Giovanni Morrone; University of Modena and Reggio Emilia
         Daniel Michelsanti; Aalborg University
         Zheng-Hua Tan; Aalborg University
         Jesper Jensen; Aalborg University
 
 SPE-36.2: VSET: A MULTIMODAL TRANSFORMER FOR VISUAL SPEECH ENHANCEMENT
         Karthik Ramesh; Huawei
         Chao Xing; Huawei
         Wupeng Wang; Huawei
         Dong Wang; Tsinghua University
         Xiao Chen; Huawei
 
 SPE-36.3: SWITCHING VARIATIONAL AUTO-ENCODERS FOR NOISE-AGNOSTIC AUDIO-VISUAL SPEECH ENHANCEMENT
         Mostafa Sadeghi; Inria, Grenoble Alpes
         Xavier Alameda-Pineda; Inria, Grenoble Alpes
 
 SPE-36.4: AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS
         Koichiro Ito; Hitachi, Ltd.
         Masaaki Yamamoto; Hitachi, Ltd.
         Kenji Nagamatsu; Hitachi, Ltd.
 
 SPE-36.5: AUDIO-VISUAL SPEECH SEPARATION USING CROSS-MODAL CORRESPONDENCE LOSS
         Naoki Makishima; NTT Media Intelligence Laboratories, NTT Corporation
         Mana Ihori; NTT Media Intelligence Laboratories, NTT Corporation
         Akihiko Takashima; NTT Media Intelligence Laboratories, NTT Corporation
         Tomohiro Tanaka; NTT Media Intelligence Laboratories, NTT Corporation
         Shota Orihashi; NTT Media Intelligence Laboratories, NTT Corporation
         Ryo Masumura; NTT Media Intelligence Laboratories, NTT Corporation
 
 SPE-36.6: MUSE: MULTI-MODAL TARGET SPEAKER EXTRACTION WITH VISUAL CUES
         Zexu Pan; National University of Singapore
         Ruijie Tao; National University of Singapore
         Chenglin Xu; National University of Singapore
         Haizhou Li; National University of Singapore