Landmark Assisted CycleGAN for Cartoon Face Generation

3. Our Method

3.1. Review of CycleGAN

给定来自两个domain的unpaired training samples xX,yYx\in X, y\in Y,对于其从XXYY的mapping GXYG_{X\rightarrow Y},及其判别器DYD_Y,adversarial loss定义如下
LGAN(GXY,DY)=Ey[logDy(y)]+Ex[log(1DY(GXY(x)))](1) \begin{aligned} \mathcal{L}_{GAN}&\left ( G_{X\rightarrow Y}, D_Y \right )=\mathbb{E}_y\left [ \log D_y(y) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{X\rightarrow Y}(x) \right ) \right ) \right ] \qquad(1) \end{aligned}

CycleGAN学习正向和反向的mapping,the cycle consistency
loss如下
Lcyc=GYX(GXY(x))x1+GXY(GYX(y))y1(2) \begin{aligned} \mathcal{L}_{cyc}=&\left \| G_{Y\rightarrow X}\left ( G_{X\rightarrow Y}(x) \right )-x \right \|_1+ \\ &\left \| G_{X\rightarrow Y}\left ( G_{Y\rightarrow X}(y) \right )-y \right \|_1 \qquad(2) \end{aligned}

CycleGAN的total objective function定义如下
L(GXY,GYX,DX,DY)=LGAN(GXY,DY)+LGAN(GXY,DY)+Lcyc(3) \begin{aligned} \mathcal{L}\big ( G_{X\rightarrow Y}, &G_{Y\rightarrow X}, D_X, D_Y \big ) = \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \mathcal{L}_{cyc} \qquad(3) \end{aligned}

本文定义XX为real face domain,YY为cartoon face domain

3.2. Cartoon Face Landmark Assisted CycleGAN

3.2.1 Landmark Consistency Loss
Lc(G(X,L)Y)=RY(G(X,L)Y(x,l))l2(4) \begin{aligned} \mathcal{L}_c\big ( &G_{(X,L)\rightarrow Y} \big )= \\ &\left \| R_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ) \right )-l \right \|_2 \qquad(4) \end{aligned}
其中 lLl\in L是input landmark heatmap,RR是一个预训练的U-Net,用于预测landmark heatmap,RYR_Y表示domain YY中的landmark regressor

公式(4)的含义为,对于real face image xx及其landmark ll,送入生成器G(X,L)YG_{(X,L)\rightarrow Y}生成图像,对于生成的图像使用RYR_Y预测landmark,应该尽可能地与ll接近
Landmark Assisted CycleGAN for Cartoon Face Generation
3.2.2 Landmark Matched Global Discriminator
如Figure 2所示,对于translation XYX\rightarrow Y,unconditional global discriminator DYD_Y produces more realistic cartoon faces,conditional global discriminator DYgcD_Y^{g_c} aims to generate landmark-matched cartoon faces with landmark heat map lLl\in L as part of input
LGAN(G(X,L)Y,DYgc)=Ey[logDY(y,l)]+Ex[log(1DY(G(X,L)Y(x,l),l))](5) \begin{aligned} \mathcal{L}_{GAN}\big ( &G_{(X,L)\rightarrow Y}, D_Y^{g_c} \big )=\mathbb{E}_y\left [ \log D_Y\left ( y,l \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ),l \right ) \right ) \right ] \qquad(5) \end{aligned}
Landmark Assisted CycleGAN for Cartoon Face Generation
3.2.3 Landmark Guided Local Discriminator
在眼睛、鼻子、嘴巴的区域引入3个local discriminators,其adversarial loss定义如下
LGANlocalXY=i=13λliLGANpatch(G(X,L)Y,DYli)=i=13λli{Ey[logDYli(yp)]+Ex[log(1DYli([G(X,L)Y(x)]p))]}(6) \begin{aligned} &\mathcal{L}_{GAN_{local}^{X\rightarrow Y}}=\sum_{i=1}^{3}\lambda_{l_i}\cdot\mathcal{L}_{GAN_{patch}}\left ( G_{(X,L)\rightarrow Y}, D_Y^{l_i} \right ) \\ &=\sum_{i=1}^{3}\lambda_{l_i}\Big \{ \mathbb{E}_y\left [ \log D_Y^{l_i}\left ( y_p \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y^{l_i}\left ( \left [ G_{(X,L)\rightarrow Y}(x) \right ]_p \right ) \right ) \right ]\Big \} \qquad(6) \end{aligned}
其中ypy_p[G(X,L)Y(x)]p\left [ G_{(X,L)\rightarrow Y}(x) \right ]_p分别表示real cartoon image与generated cartoon image的local patch

3.3. Network Training

3.3.1 Two Stage Training

Stage I 首先在framework中去掉local discriminator训练100K iterations,得到coarse results

Stage II 使用pre-trained landmark prediction network对coarse images预测landmark,利用landmark提取local patch,送入local discriminator得到更精确的生成结果