2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information
Login Paper Search My Schedule Paper Index Help

My ICASSP 2021 Schedule

Note: Your custom schedule will not be saved unless you create a new account or login to an existing account.
  1. Create a login based on your email (takes less than one minute)
  2. Perform 'Paper Search'
  3. Select papers that you desire to save in your personalized schedule
  4. Click on 'My Schedule' to see the current list of selected papers
  5. Click on 'Printable Version' to create a separate window suitable for printing (the header and menu will appear, but will not actually print)

Paper Detail

Paper IDSPE-12.6
Paper Title Multi-Task WaveRNN with an Integrated Architecture for Cross-lingual Voice Conversion
Authors Yi Zhou, Xiaohai Tian, Haizhou Li, National University of Singapore, Singapore
SessionSPE-12: Voice Conversion 2: Low-Resource & Cross-Lingual Conversion
LocationGather.Town
Session Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Time:Tuesday, 08 June, 16:30 - 17:15
Presentation Poster
Topic Speech Processing: [SPE-SYNT] Speech Synthesis and Generation
Abstract Spoken languages are similar phonetically because humans have a common vocal production system. However, each language has a unique phonetic repertoire and phonotactic rule. In cross-lingual voice conversion, source speaker and target speaker speak different languages. The challenge is how to project the speaker identity of the source speaker to that of the target across two different phonetic systems. A typical voice conversion system employs a generator-vocoder pipeline, where the generator is responsible for conversion, and the vocoder is for waveform reconstruction. We propose a novel Multi-Task WaveRNN with an integrated architecture for cross-lingual voice conversion. The WaveRNN is trained on two sets of monolingual data via a two-task learning. The integrated architecture takes linguistic features as input and outputs speech waveform directly. Voice conversion experiments are conducted between English and Mandarin, which confirm the effectiveness of the proposed method in terms of speech quality and speaker similarity.