PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer（CVPR20）

3. PSGAN

3.1. Formulation

source image domain $X$ , reference image domain $Y$

$\left \{ x^n \right \}_{n=1,\cdots,N}, x^n\in X$ ， $\left \{ y^m \right \}_{m=1,\cdots,M}, y^m\in Y$

domain $X$ 上的分布 $\mathcal{P}_X$ ，domain $Y$ 上的分布 $\mathcal{P}_Y$

学习目标是一个transfer function $G:\left \{ x, y \right \}\rightarrow\tilde{x}$ ， $\tilde{x}$ 包含 $y$ 的makeup style，以及 $x$ 的identity

3.2. Framework

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer（CVPR20）
Overall

PSGAN的framework如Fig. 2所示

Makeup distill network（MDNet），从reference image $y$ 中提取makeup style，共有2个成分 $\gamma, \beta$ ，称为makeup matrices
Attentive makeup morphing module（AMM module），因为source image $x$ 和reference image $y$ 之间的expression和pose差异很大，所以提出AMM module，用于morph the two makeup matrices $\lambda, \beta$ to two new matrices $\lambda', \beta'$ , which are adaptive to the source image by considering the similarities between pixels of the source and reference
Makeup apply network（MANet），将 $\lambda', \beta'$ 作用在MANet的bottleneck feature map上

Makeup distill network（MDNet）

MDNet的网络结构为StarGAN的encoder-bottleneck部分（bottleneck指residual block），负责提取 the makeup related features（如唇彩、眼影等），这些feature被表示为2个makeup matrices $\gamma, \beta$

如Fig.2(B)所示，MDNet的输出为feature map $\mathbf{V}_\mathbf{y}\in\mathbb{R}^{C\times H\times W}$ ，后接2个并列的1x1 conv layer，得到 $\gamma\in\mathbb{R}^{1\times H\times W}, \beta\in\mathbb{R}^{1\times H\times W}$

Attentive makeup morphing module（AMM module）

因为source image $x$ 和reference image $y$ 之间的expression和pose差异很大，所以不能直接将 $\gamma, \beta$ 直接作用在 source image $x$ 上
Q：可以认为 $\gamma, \beta$ 中仍然包含reference image $y$ 的expression和pose等信息吗？

AMM module计算一个attentive matrix $A\in\mathbb{R}^{HW\times HW}$ to specify how a pixel in the source image $x$ is morphed from the pixels in the reference image $y$ ，where $A_{i,j}$ indicates the attentive value between the $i$ -th pixel $x_i$ in image $x$ and the $j$ -th pixel $y_j$ in image $y$
理解：假设在 $x$ 中position $i$ 是眼角的位置，在 $y$ 中position $j$ 也是眼角的位置，那么 $A_{i,j}$ 的值应该比较大，意味着 $\tilde{x}$ 中position $i$ 的像素值应该参考 $y$ 中position $j$ 的像素值，才能实现较好的眼影迁移
（有个缺点，既然把 $H$ 和 $W$ 乘起来了，一定程度上丢失了spatial information）

引入68个facial landmarks作为anchor points
以鼻尖处的landmark为例，对于 $x$ 的所有position，计算该position $i$ 到鼻尖x的距离（有正有负），得到一个2维vector，于是所有68 landmark就可以得到136维向量， $\mathbf{p}_i\in\mathbb{R}^{136}, i=1,\cdots,H\times W$ ，称为relative position features
$\begin{aligned} \mathbf{p}=&[ f(x_i)-f(l_1), f(x_i)-f(l_2),\cdots,f(x_i)-f(l_{68}) \\ &g(x_i)-g(l_1), g(x_i)-g(l_2),\cdots,g(x_i)-g(l_{68}) ] \qquad(1) \end{aligned}$
where $f(\cdot)$ and $g(\cdot)$ indicate the coordinates on $x$ and $y$ axes, $l_i$ indicates the $i$ -th facial landmark
思考： $\mathbf{p}$ 的维度应该是 $H\times W\times136$ 吧

既然是landmark，那么必然会存在face size的差异，因此令 $\mathbf{p}$ 单位化，即 $\frac{\mathbf{p}}{\left \| \mathbf{p} \right \|}$

Moreover, to avoid unreasonable sampling pixels with similar relative positions but different semantics, we also consider the visual similarities between pixels

PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer（CVPR20）

3. PSGAN

3.1. Formulation

3.2. Framework

相关推荐