Landmark Assisted CycleGAN for Cartoon Face Generation

3. Our Method

3.1. Review of CycleGAN

给定来自两个domain的unpaired training samples $x\in X, y\in Y$ ，对于其从 $X$ 到 $Y$ 的mapping $G_{X\rightarrow Y}$ ，及其判别器 $D_Y$ ，adversarial loss定义如下
$\begin{aligned} \mathcal{L}_{GAN}&\left ( G_{X\rightarrow Y}, D_Y \right )=\mathbb{E}_y\left [ \log D_y(y) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{X\rightarrow Y}(x) \right ) \right ) \right ] \qquad(1) \end{aligned}$

CycleGAN学习正向和反向的mapping，the cycle consistency
loss如下
$\begin{aligned} \mathcal{L}_{cyc}=&\left \| G_{Y\rightarrow X}\left ( G_{X\rightarrow Y}(x) \right )-x \right \|_1+ \\ &\left \| G_{X\rightarrow Y}\left ( G_{Y\rightarrow X}(y) \right )-y \right \|_1 \qquad(2) \end{aligned}$

CycleGAN的total objective function定义如下
$\begin{aligned} \mathcal{L}\big ( G_{X\rightarrow Y}, &G_{Y\rightarrow X}, D_X, D_Y \big ) = \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \\ &\mathcal{L}_{GAN}\left ( G_{X\rightarrow Y}, D_Y \right ) + \mathcal{L}_{cyc} \qquad(3) \end{aligned}$

本文定义 $X$ 为real face domain， $Y$ 为cartoon face domain

3.2. Cartoon Face Landmark Assisted CycleGAN

3.2.1 Landmark Consistency Loss
$\begin{aligned} \mathcal{L}_c\big ( &G_{(X,L)\rightarrow Y} \big )= \\ &\left \| R_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ) \right )-l \right \|_2 \qquad(4) \end{aligned}$
其中 $l\in L$ 是input landmark heatmap， $R$ 是一个预训练的U-Net，用于预测landmark heatmap， $R_Y$ 表示domain $Y$ 中的landmark regressor

公式(4)的含义为，对于real face image $x$ 及其landmark $l$ ，送入生成器 $G_{(X,L)\rightarrow Y}$ 生成图像，对于生成的图像使用 $R_Y$ 预测landmark，应该尽可能地与 $l$ 接近
Landmark Assisted CycleGAN for Cartoon Face Generation
3.2.2 Landmark Matched Global Discriminator
如Figure 2所示，对于translation $X\rightarrow Y$ ，unconditional global discriminator $D_Y$ produces more realistic cartoon faces，conditional global discriminator $D_Y^{g_c}$ aims to generate landmark-matched cartoon faces with landmark heat map $l\in L$ as part of input
$\begin{aligned} \mathcal{L}_{GAN}\big ( &G_{(X,L)\rightarrow Y}, D_Y^{g_c} \big )=\mathbb{E}_y\left [ \log D_Y\left ( y,l \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y\left ( G_{(X,L)\rightarrow Y}\left ( x,l \right ),l \right ) \right ) \right ] \qquad(5) \end{aligned}$
Landmark Assisted CycleGAN for Cartoon Face Generation
3.2.3 Landmark Guided Local Discriminator
在眼睛、鼻子、嘴巴的区域引入3个local discriminators，其adversarial loss定义如下
$\begin{aligned} &\mathcal{L}_{GAN_{local}^{X\rightarrow Y}}=\sum_{i=1}^{3}\lambda_{l_i}\cdot\mathcal{L}_{GAN_{patch}}\left ( G_{(X,L)\rightarrow Y}, D_Y^{l_i} \right ) \\ &=\sum_{i=1}^{3}\lambda_{l_i}\Big \{ \mathbb{E}_y\left [ \log D_Y^{l_i}\left ( y_p \right ) \right ] \\ &+\mathbb{E}_x\left [ \log\left ( 1-D_Y^{l_i}\left ( \left [ G_{(X,L)\rightarrow Y}(x) \right ]_p \right ) \right ) \right ]\Big \} \qquad(6) \end{aligned}$
其中 $y_p$ 与 $\left [ G_{(X,L)\rightarrow Y}(x) \right ]_p$ 分别表示real cartoon image与generated cartoon image的local patch

3.3. Network Training

3.3.1 Two Stage Training

Stage I 首先在framework中去掉local discriminator训练100K iterations，得到coarse results

Stage II 使用pre-trained landmark prediction network对coarse images预测landmark，利用landmark提取local patch，送入local discriminator得到更精确的生成结果

Landmark Assisted CycleGAN for Cartoon Face Generation

3. Our Method

3.1. Review of CycleGAN

3.2. Cartoon Face Landmark Assisted CycleGAN

3.3. Network Training

相关推荐