2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	CHLG-3.2
Paper Title	THE THINKIT SYSTEM FOR ICASSP2021 M2VOC CHALLENGE
Authors	Zengqiang Shang, Haozhe Zhang, Ziyi Chen, Bolin Zhou, Pengyuan Zhang, University of Chinese Academy of Sciences, China
Session	CHLG-3: Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)
Location	Zoom
Session Time:	Monday, 07 June, 15:30 - 17:45
Presentation Time:	Monday, 07 June, 15:30 - 17:45
Presentation	Poster
Topic	Grand Challenge: Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC)
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	In this paper, we introduce the low resource text-to-speech system from the ThinkIT team submitted to Multi-Speaker Multi-Style Voice Cloning Challenge (M2VoC). The challenge has two tasks: few-shot track1 provides 100 samples for each person and one-shot track2 offers 5 samples only. Each track contains two sub-tracks A and B. Instead of sub-track A, sub-track B can use extra public data besides the released data. But we participate in the sub-track A only. We choose the finetune as our backbone strategy. Our submitted systems include BERT based prosody boundary prediction module, FastSpeech based acoustic model to generate acoustic features from text input, and HIFIGAN based vocoder to generate waveform from acoustic features. Among them, acoustic models are susceptible to low resource speakers. To prevent over-fitting, we modified the acoustic model and split out validation set to assist the manual model selection. Evaluation results provided by the challenges organizers demonstrate the effectiveness of our system.