2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDCHLG-3.6
Paper Title THE MULTI-SPEAKER MULTI-STYLE VOICE CLONING CHALLENGE 2021
Authors Qicong Xie, Northwestern Polytechnical University, China; Xiaohai Tian, National University of Singapore, Singapore; Guanghou Liu, Kun Song, Lei Xie, Northwestern Polytechnical University, China; Zhiyong Wu, Tsinghua University, China; Hai Li, Song Shi, iQIYI Inc, China; Haizhou Li, National University of Singapore, Singapore; Fen Hong, Originbeat Inc, China; Hui Bu, Xin Xu, Beijing Shell Shell Technology Co., Ltd, China
SessionCHLG-3: Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)
LocationZoom
Session Time:Monday, 07 June, 15:30 - 17:45
Presentation Time:Monday, 07 June, 15:30 - 17:45
Presentation Poster
Topic Grand Challenge: Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Abstract The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively. There are also two sub-tracks in each track. For sub-track a, to fairly compare different strategies, the participants are allowed to use only the training data provided by the organizer strictly. For sub-track b, the participants are allowed to use any data publicly available. In this paper, we present a detailed explanation on the tasks and data used in the challenge, followed by a summary of submitted systems and evaluation results.