Abstract

This paper presents a generic face animator that is able to control the pose and expressions of a given face image. The animation is driven by human interpretable control signals consisting of head pose angles and the Action Unit (AU) values. The control information can be obtained from multiple sources including external driving videos and manual controls. Due to the interpretable nature of the driving signal, one can easily mix the information between multiple sources (e.g. pose from one image and expression from another) and apply selective post-production editing. The proposed face animator is implemented as a two stage neural network model that is learned in self-supervised manner using a large video collection. The proposed Interpretable and Controllable face reenactment network (ICface) is compared to the state-of-the-art neural network based face animation techniques in multiple tasks. The results indicate that ICface produces better visual quality, while being more versatile than most of the comparison methods. The introduced model could provide a lightweight and easy to use tool for multitude of advanced image and video editing tasks.

ICface in action

The Architecture of ICface

The overall architecture of the proposed model (ICface) for face animation. In the training phase, we select two frames from the same video and denote them as source and driving image. The generator G_N takes the encoded source image and neutral facial attributes (FA_N ) as input and produces an image representing the source identity with central pose and neutral expression (neutral image). In the second phase, the generator G_A takes the encoded neutral image and attributes extracted from the driving image (FA_D ) as an input and produces an image representing the source identity with desired attribute parameters FA_D . The generators are trained using multiple loss functions implemented using the discriminator D (see Section 3 for details). In addition, since the driving and source images have the same identity, a direct pixel based reconstruction loss can also be utilized. Note that this is assumed to be true only during training and in the test case the identities are likely to be different.

Paper

ICface: Interpretable and Controllable Face Reenactment Using GANs

This paper presents a generic face animator that is able to control the pose and expressions of a given face image. The animation is driven by human interpretable control signals consisting of head pose angles and the Action Unit (AU) values. The control information can be obtained from multiple sources including external driving videos and manual controls.

Code

:white_check_mark: test code is out now !

Check for more updates in my github page.

Blade6570/icface

ICface: Interpretable and Controllable Face Reenactment Using GANs - Blade6570/icface

#Citation 

@article{tripathy+kannala+rahtu,
  title={ICface: Interpretable and Controllable Face Reenactment Using GANs},
  author={Tripathy, Soumya and Kannala, Juho and Rahtu, Esa},
  journal={arXiv preprint arXiv:1904.01909},
  year={2019}
}

Related Work
1. O. Wiles, A. S. Koepke, A. Zisserman "X2Face: A network for controlling face generation by using images, audio, and pose codes", in ECCV 2018.
2. Zhixin Shu, Mihir Sahasrabudhe, Alp Guler, Dimitris Samaras ,Nikos Paragios, Iasonas Kokkinos "Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance", in ECCV 2018.
3. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio "Generative Adversarial Networks", in NIPS 2014. 

Octocat