Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros

Abstract

任务：Image-to-image translation(learn the mapping between an input image and an output image using a training set of aligned image pairs)
困难：paired training data will not be available
解决方案：learning to translate an image from a source domain $X$ to a target domain $Y$ in the absence of paired examples
具体方法：learn a mapping $G : X \to Y$ such that the distribution of images from $G (x)$ is indistinguishable from the distribution $Y$ using an adversarial loss. 但是直接这么做约束太弱了，于是加了另外一个约束 $F : Y \to X$ 使得 $F$ 是 $G$ 的反函数，即 $F (G (x)) \approx G (F (x)) \approx x$ 。这就是CycleGAN的基本思想（后来ICLR2018的一篇用GAN在隐空间生成对抗样本的工作和这个很像）

Introduction

开头两段很有文采~
为了避免image translation中搜集标记数据集的困难，可以直接进行集合层面上而不是元素层面上的匹配，即domain mapping。进行domain mapping，就是要找到一个 $G$ ，使得 $\forall x \in X, \hat{y} = G (x)$ 不能被在 $Y$ 上的属性分类器分辨出。这种方法理论上是可行的，但是不能保证生成的 $\hat{y}$ 和 $x$ 匹配。而且实际中光这样很难训练，经常导致mode collapse(all input images map to the same output image and the optimization fails to make progress).
因此需要对训练加上更多的约束，这里作者用的是“cycle consistent”，即用两个image translator使得它们互为反函数（当然每一个函数本身应该是双射）。于是将 $G, F$ 同时进行训练，并且加上cycle consistency loss使得 $F (G (x)) \approx G (F (x)) \approx x$ 就是本文的核心内容。

Related work

GAN

Image-to-Image Translation

近期基本上流行用CNN学习不同风格的两张匹配图像间的对应关系。本文基于pix2pix的工作，输入一张图像输出一张图像，但是学习的是集合间的对应关系，不需要元素间的匹配。