Paper ID | AUD-2.3 |
Paper Title |
COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION |
Authors |
Yixuan Zhang, Yuzhou Liu, DeLiang Wang, The Ohio State University, United States |
Session | AUD-2: Audio and Speech Source Separation 2: Music and Singing Voice Separation |
Location | Gather.Town |
Session Time: | Tuesday, 08 June, 13:00 - 13:45 |
Presentation Time: | Tuesday, 08 June, 13:00 - 13:45 |
Presentation |
Poster
|
Topic |
Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment. |