2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

IEEE Signal Processing Society

Institute of Electrical and Electronics Engineers (IEEE)

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper ID	IVMSP-18.2
Paper Title	EFFICIENT FACE MANIPULATION VIA DEEP FEATURE DISENTANGLEMENT AND REINTEGRATION NET
Authors	Bin Cheng, Tao Dai, Bin Chen, Shutao Xia, Tsinghua University, Peng Cheng Laboratory, China; Xiu Li, Tsinghua University, China
Session	IVMSP-18: Faces in Images & Videos
Location	Gather.Town
Session Time:	Wednesday, 09 June, 16:30 - 17:15
Presentation Time:	Wednesday, 09 June, 16:30 - 17:15
Presentation	Poster
Topic	Image, Video, and Multidimensional Signal Processing: [IVARS] Image & Video Analysis, Synthesis, and Retrieval
IEEE Xplore Open Preview	Click here to view in IEEE Xplore
Virtual Presentation	Click here to watch in the Virtual Conference
Abstract	Deep neural networks (DNNs) have been widely used in facial manipulation. Existing methods focus on training deeper networks in indirect supervision ways (e.g., feature constraint), or in unsupervised ways (e.g., cycle-consistency loss) due to the lack of ground-truth face images for manipulated outputs. However, such methods can not synthesize realistic face images well and suffer from very high training overhead. To address this issue, we propose a novel Feature Disentanglement and Reintegraion network (FDRNet), which employs ground-truth images as informative supervision and dynamically adapts the fusion of informative features of the ground-truth images in a self-supervised way. FDRNet consists of a Feature Disentanglement (FD) Network and a FeatureReintegration (FR) Network, which encodes informative disentangled representations from the ground-truth images and fuses the disentangled representations to reconstruct the face images. By learning disentangled representations, our method can generate plausible faces conditioned on both landmarks and identities, which can be used for a variety of face manipulation tasks. Experiments on the CelebA-HQ and FFHQ datasets are conducted to demonstrate the superiority of our method over state-of-the-art methods in terms of effectiveness and efficiency.