2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDIVMSP-18.2
Paper Title EFFICIENT FACE MANIPULATION VIA DEEP FEATURE DISENTANGLEMENT AND REINTEGRATION NET
Authors Bin Cheng, Tao Dai, Bin Chen, Shutao Xia, Tsinghua University, Peng Cheng Laboratory, China; Xiu Li, Tsinghua University, China
SessionIVMSP-18: Faces in Images & Videos
LocationGather.Town
Session Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Time:Wednesday, 09 June, 16:30 - 17:15
Presentation Poster
Topic Image, Video, and Multidimensional Signal Processing: [IVARS] Image & Video Analysis, Synthesis, and Retrieval
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Deep neural networks (DNNs) have been widely used in facial manipulation. Existing methods focus on training deeper networks in indirect supervision ways (e.g., feature constraint), or in unsupervised ways (e.g., cycle-consistency loss) due to the lack of ground-truth face images for manipulated outputs. However, such methods can not synthesize realistic face images well and suffer from very high training overhead. To address this issue, we propose a novel Feature Disentanglement and Reintegraion network (FDRNet), which employs ground-truth images as informative supervision and dynamically adapts the fusion of informative features of the ground-truth images in a self-supervised way. FDRNet consists of a Feature Disentanglement (FD) Network and a FeatureReintegration (FR) Network, which encodes informative disentangled representations from the ground-truth images and fuses the disentangled representations to reconstruct the face images. By learning disentangled representations, our method can generate plausible faces conditioned on both landmarks and identities, which can be used for a variety of face manipulation tasks. Experiments on the CelebA-HQ and FFHQ datasets are conducted to demonstrate the superiority of our method over state-of-the-art methods in terms of effectiveness and efficiency.