2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

SPE-4: Speech Synthesis 2: Controllability

Session Type: Poster
Time: Tuesday, 8 June, 13:00 - 13:45
Location: Gather.Town
Session Chair: Yu Zhang, Google
 
SPE-4.1: PARALLEL TACOTRON: NON-AUTOREGRESSIVE AND CONTROLLABLE TTS
         Isaac Elias; Google
         Heiga Zen; Google
         Jonathan Shen; Google
         Yu Zhang; Google
         Ye Jia; Google
         Ron Weiss; Google
         Yonghui Wu; Google
 
SPE-4.2: FCL-TACO2: TOWARDS FAST, CONTROLLABLE AND LIGHTWEIGHT TEXT-TO-SPEECH SYNTHESIS
         Disong Wang; The Chinese University of Hong Kong
         Liqun Deng; Huawei Noah's Ark Lab
         Yang Zhang; Huawei Noah's Ark Lab
         Nianzu Zheng; Huawei Noah's Ark Lab
         Yu Ting Yeung; Huawei Noah's Ark Lab
         Xiao Chen; Huawei Noah's Ark Lab
         Xunying Liu; The Chinese University of Hong Kong
         Helen Meng; The Chinese University of Hong Kong
 
SPE-4.3: PROSODIC CLUSTERING FOR PHONEME-LEVEL PROSODY CONTROL IN END-TO-END SPEECH SYNTHESIS
         Alexandra Vioni; Innoetics, Samsung Electronics
         Myrsini Christidou; Innoetics, Samsung Electronics
         Nikolaos Ellinas; Innoetics, Samsung Electronics
         Georgios Vamvoukakis; Innoetics, Samsung Electronics
         Panos Kakoulidis; Innoetics, Samsung Electronics
         Taehoon Kim; Mobile Communications Business, Samsung Electronics
         June Sig Sung; Mobile Communications Business, Samsung Electronics
         Hyoungmin Park; Mobile Communications Business, Samsung Electronics
         Aimilios Chalamandaris; Innoetics, Samsung Electronics
         Pirros Tsiakoulis; Innoetics, Samsung Electronics
 
SPE-4.4: IMPROVING NATURALNESS AND CONTROLLABILITY OF SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS BY LEARNING LOCAL PROSODY REPRESENTATIONS
         Cheng Gong; Tianjin University
         Longbiao Wang; Tianjin University
         Zhenhua Ling; University of Science and Technology of China
         Shaotong Guo; Tianjin University
         Ju Zhang; Huiyan Technology (Tianjin) Co., Ltd
         Jianwu Dang; Japan Advanced Institute of Science and Technology
 
SPE-4.5: MULTI-SPEAKER EMOTIONAL SPEECH SYNTHESIS WITH FINE-GRAINED PROSODY MODELING
         Chunhui Lu; Samsung Research China-Beijing
         Xue Wen; Samsung Research China-Beijing
         Ruolan Liu; Samsung Research China-Beijing
         Xiao Chen; Samsung Research China-Beijing
 
SPE-4.6: EMOTION CONTROLLABLE SPEECH SYNTHESIS USING EMOTION-UNLABELED DATASET WITH THE ASSISTANCE OF CROSS-DOMAIN SPEECH EMOTION RECOGNITION
         Xiong Cai; Tsinghua University
         Dongyang Dai; Tsinghua University
         Zhiyong Wu; Tsinghua University
         Xiang Li; Tsinghua University
         Jingbei Li; Tsinghua University
         Helen Meng; Chinese University of Hong Kong