Generative Adversarial Network 3 WGAN/ EBGAN
前提回顾
JS 散度 Jensen-Shannon Divergence
解决了两个概率分布的相似度,值0-1之间
但是如果P,Q离得很远,完全没有重叠的时候,KL散度值是没有意义的,JS散度值是个常数,这就意味着这一点梯度为0.
JS divergence is not suitable
- in most case,PGandPdata are not overlapped
1.PGandPdata are low-dimension manifold in high-dimension space
2.even PGandPdata are overlap, if you do not have enough sampling
What is the problem of JS divergence
JS divergence is log2 if two distributions do not overlap
same objective value
一.Wasserstein GAN(WGAN)
Earth Mover’s Distance
- there are many possible “moving plans”
- Using the “moving plan” with the smallest average distance to define the earth mover’s distance
why earth mover distance
Evaluate wasserstein distance between PGandPdata
discriminator must be smooth
为了使D 不会变成无穷大或者无穷小
Lipschitz Function
- 保证output差距不会太大
- 所以K=1 for “1-Lipschitz”
How to fulfill this constraint
1.WGAN
Improved WGAN (WGAN-GP)
- D为1-Lipschitz 和 对Dx(x)中所有求x的倒数都小于1
- 妥协:不能保证所以x倒数都小于1,就保证penalty中的小于1
Only give gradient constraint to the region betweenPGand Pdata,because they influence how PG move to Pdata.
2.spectrum norm
spectral normalization
keep gradient norm smaller than 1 everywhere
The algorithm of WGAN
二.Energy-based GAN (EBGAN)
- discriminator 可以提前训练,只用positive的样本就行
- do not have to be very negative 因为实际减小是很难的,设定一个阈值就可以