2021 IEEE International Conference on Acoustics, Speech and Signal Processing

Technical Program

Paper ID	AUD-2.3
Paper Title	COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION
Authors	Yixuan Zhang, Yuzhou Liu, DeLiang Wang, The Ohio State University, United States
Session	AUD-2: Audio and Speech Source Separation 2: Music and Singing Voice Separation
Location	Gather.Town
Session Time:	Tuesday, 08 June, 13:00 - 13:45
Presentation Time:	Tuesday, 08 June, 13:00 - 13:45
Presentation	Poster
Topic	Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment.