Finding Tiny Faces in the Wild with Generative Adversarial Network

Finding Tiny Faces in the Wild with Generative Adversarial Network

Yancheng Bai, Yongqiang Zhang, Mingli Ding, Bernard Ghanem

Abstract

task: detecting small faces in unconstrained conditions
challenges: lacking detailed information and blurring
solution: directly generate a clear high-resolution face from a blurry small one by adopting a generative adversarial network (GAN).
traditional method: super-resolving and refining sequentially
solution: design a novel network
new training losses to guide the generator network to recover fine details and to promote the discriminator network to distinguish real vs. fake and face vs. non-face simultaneously

Introduction

large and medium faces detection: good
small faces: far from satisfactory
difficulty: lack sufficient detailed information to distinguish them from the similar background; modern CNN-based face detectors use the down-sampled convolutional (conv) feature maps with stride 8, 16 or 32 to represent faces, losing most spatial information and are too coarse to describe small faces
traditional solution: directly up-samples images using bi-linear operation and exhaustively searches faces on the up-sampled images, increasing the computation cost and the inference time too; use the intermediate conv feature maps to represent faces at specific scales, the shallow but fine-grained intermediate conv feature maps lack discrimination, which causes many false positive results. take no care of other challenges
our solution: use GAN. generator = SRN + RN. super-resolution network(SRN) up-sample small faces to fine scale, reducing the artifact and improving the quality of up-sampled images with a large upscaling factors. refinement network (RN) recover some missing details in the up-sampled images and generate sharp high-resolution images for classification. discriminator sub-network utilize a new loss function that enforces the discriminator network to distinguish the real/fake face and face/non-face simultaneously, distinguish whether they are real images or generated high-resolution images and whether they are faces or non-faces.
contribution:
(1) GAN: generator = SRN + RN, discriminator multi-task
(2) new loss: promote the discriminator network to distinguish the real/fake image and face/non-face simultaneously
(3) state-of-the-art performance

Related Work

Face Detection

hand-crafted feature based methods: a single scale, restricts the performance of detectors
CNN-based methods + upsample by re-sizing input images to different scales during training and testing: inevitably increases memory and computation costs, generates the images with large structural distortions
our method: exploits the super-resolution and refinement network to generate clear and fine faces with high resolution
Finding Tiny Faces in the Wild with Generative Adversarial Network
感觉这效果是不是太过了。。。而且有的地方把不是人脸的部位也判断为人脸了

Superresolution and Refinement Network

the first work trying to jointly super-resolve and refine the small blurry faces in the wild

Generative Adversarial Networks

super-resolution (SRGAN), blurry and lack fine details especially for low-resolution faces
extend the discriminator network to classify the fake vs. real and face vs. non-face simultaneously

Proposed Method

GAN

ILR: low-resolution face candidates
IHR: high-resolution face candidates
y: label, 1 for face, 0 for non-face
generator: G:ILRIHR
discriminator: D, distinguish the generated vs. true high-resolution images and faces vs. non-faces jointly

minθGmaxθDE(IHR,y)p(IHR,y)(logD(IHR,y;θD))+E(ILR,y)p(ILR,y)(log(1D(G(ILR,y;θG);θD)))

Network Architecture

SRN: takes the low-resolution images as the inputs and the outputs are the super-resolution images, usually blurring
RN: refine the super-resolution images
Finding Tiny Faces in the Wild with Generative Adversarial Network

Loss Function

pixel-wise loss(generator): 类似自编码器的loss, LMSE=G1(ILR)IHR2+G2(G1(ILR))IHR2, 其中G1,G2分别表示SRN, RN
adversarial loss(discriminator): Ladv=log(1D(G(ILR)))
Classification loss: Lclc=log(), 不用softmax loss?
结合三个loss进行加权求和就得到最终的loss
这样的工作本人最近在MNIST上也做过,只不过并非对于超分辨任务,真是不谋而合!