Paper ID | SPE-17.3 |
Paper Title |
MULTI-CHANNEL TARGET SPEECH EXTRACTION WITH CHANNEL DECORRELATION AND TARGET SPEAKER ADAPTATION |
Authors |
Jiangyu Han, Xinyuan Zhou, Yanhua Long, Shanghai Normal University, China; Yijie Li, Unisound AI Technology Co., Ltd., China |
Session | SPE-17: Speech Enhancement 3: Target Speech Extraction |
Location | Gather.Town |
Session Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation Time: | Wednesday, 09 June, 14:00 - 14:45 |
Presentation |
Poster
|
Topic |
Speech Processing: [SPE-ENHA] Speech Enhancement and Separation |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies for end-to-end multi-channel target speech extraction are still relatively limited. In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech. The first one is using a target speech adaptation layer in a parallel encoder architecture. The second one is designing a channel decorrelation mechanism to extract the inter-channel differential information to enhance the multi-channel encoder representation. We compare the proposed methods with two strong state-of-the-art baselines. Experimental results on the multi-channel reverberant WSJ0 2-mix dataset demonstrate that our proposed methods achieve up to 11.2% and 11.5% relative improvements in SDR and SiSDR, respectively. |