《python深度学习》学习笔记与代码实现(第八章:8.4,8.5,生成模型)
《python深度学习》 第八章
8.4 用变分自编码器(VAE)生成图像
变分自动编码器也是一个生成模型,与GAN是一样的
VAE的工作原理如下
1.一个编码器模块将输入样本转换为表示潜在空间中的两个参数,均值和方差
2.假设潜在正态分布能够生成输入图像,并从这个分布中随机采样一个点,z = z_mean + exp(z_log_variance) * epsilon, epsilon是取值很小的随机张量
3.一个解码器将潜在空间的这个点映射回原始图像
VAE的参数通过两个损失函数来进行训练,一个是重构损失,它迫使解码后的样本匹配初始输入:另一个是正则化损失,它有助于学习具有良好结构的潜在空间,并可以降低在训练数据上的过拟合
# VAE编码器网络
import keras
from keras import layers
from keras import backend as K
from keras.models import Model
import numpy as np
# 编码器
img_shape = (28,28,1)
batch_size = 16
latent_dim = 2
input_img = keras.Input(shape = img_shape)
x = layers.Conv2D(32,3,padding = 'same',activation = 'relu')(input_img)
x = layers.Conv2D(64,3,padding = 'same',activation = 'relu',strides = (2,2))(x)
x = layers.Conv2D(64,3,padding = 'same',activation = 'relu')(x)
x = layers.Conv2D(64,3,padding = 'same',activation = 'relu')(x)
shape_before_flattening= K.int_shape(x)
x = layers.Flatten()(x)
x = layers.Dense(32,activation = 'relu')(x)
# 输入图像最终被编码为这两个参数
z_mean = layers.Dense(latent_dim)(x)
z_log_var = layers.Dense(latent_dim)(x)
# 使用均值和方差来生成一个潜在空间点,使用lambda层
def sampling(args):
z_mean,z_log_var = args
epsilon = K.random_normal(shape = (K.shape(z_mean)[0],latent_dim),mean = 0.,stddev = 1.)
return z_mean + K.exp(0.5*z_log_var) * epsilon
z = layers.Lambda(sampling)([z_mean,z_log_var])
# VAE解码器网络,将潜在空间点映射为图像
# 将z输入到这里
decoder_input = layers.Input(K.int_shape(z)[1:])
# 对输入进行上采样
x = layers.Dense(np.prod(shape_before_flattening[1:]),activation = 'relu')(decoder_input)
# 将Z映射为特征图.使其形状与编码器模型最后一个Flatten层之前的特征图的形状相同
x = layers.Reshape(shape_before_flattening[1:])(x)
# 使用以下两层,将z解码为与原始图像具有相同尺寸的特征图
x = layers.Conv2DTranspose(32,3,padding = 'same',activation = 'relu',strides = (2,2))(x)
x = layers.Conv2D(1,3,padding = 'same',activation = 'sigmoid')(x)
# 将解码器模型实例化,它将decoder_input转换为解码后的图像
decoder = Model(decoder_input,x)
# 将这个实例应用于z,已得到解码后的z
z_decoded = decoder(z)
# 损失的设置方法:编写一个自定义层,并在其内部使用内置的add_loss层方法来创建一个你想要的损失
class CustomVariationalLayer(keras.layers.Layer):
def vae_loss(self,x,z_decoded):
x = K.flatten(x)
z_decoded = K.flatten(z_decoded)
xent_loss = keras.metrics.binary_crossentropy(x,z_decoded)
kl_loss = -5e-4 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var),axis = -1)
return K.mean(xent_loss + kl_loss)
def call(self,inputs):
x = inputs[0]
z_decoded = inputs[1]
loss = self.vae_loss(x,z_decoded)
self.add_loss(loss,inputs = inputs)
return x
# 对输入和解码后的输出调用自定义层,已得到最终的模型输出
y = CustomVariationalLayer()([input_img,z_decoded])
# 训练VAE
from keras.datasets import mnist
vae = Model(input_img,y)
vae.compile(optimizer = 'rmsprop',loss = None)
vae.summary()
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) (None, 28, 28, 1) 0
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 28, 28, 32) 320 input_3[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 14, 14, 64) 18496 conv2d_9[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D) (None, 14, 14, 64) 36928 conv2d_10[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D) (None, 14, 14, 64) 36928 conv2d_11[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 12544) 0 conv2d_12[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 32) 401440 flatten_2[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 2) 66 dense_4[0][0]
__________________________________________________________________________________________________
dense_6 (Dense) (None, 2) 66 dense_4[0][0]
__________________________________________________________________________________________________
lambda_3 (Lambda) (None, 2) 0 dense_5[0][0]
dense_6[0][0]
__________________________________________________________________________________________________
model_1 (Model) (None, 28, 28, 1) 56385 lambda_3[0][0]
__________________________________________________________________________________________________
custom_variational_layer_2 (Cus [(None, 28, 28, 1), 0 input_3[0][0]
model_1[1][0]
==================================================================================================
Total params: 550,629
Trainable params: 550,629
Non-trainable params: 0
__________________________________________________________________________________________________
(x_train,_),(x_test,y_test) = mnist.load_data()
x_train = x_train.astype('float32')/255
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.astype('float32')/255
x_test = x_test.reshape(x_test.shape + (1,))
vae.fit(x = x_train,y = None,shuffle = True,epochs = 10,batch_size = batch_size,validation_data = (x_test,None))
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 119s 2ms/step - loss: 15.5746 - val_loss: 0.1977
Epoch 2/10
60000/60000 [==============================] - 99s 2ms/step - loss: 0.1945 - val_loss: 0.1930 ETA: 2 - ETA:
Epoch 3/10
60000/60000 [==============================] - 99s 2ms/step - loss: 0.1903 - val_loss: 0.1893
Epoch 4/10
60000/60000 [==============================] - 96s 2ms/step - loss: 0.1879 - val_loss: 0.1870
Epoch 5/10
60000/60000 [==============================] - 95s 2ms/step - loss: 0.1863 - val_loss: 0.1859
Epoch 6/10
60000/60000 [==============================] - 95s 2ms/step - loss: 0.1852 - val_loss: 0.1857
Epoch 7/10
60000/60000 [==============================] - 98s 2ms/step - loss: 0.1842 - val_loss: 0.1845
Epoch 8/10
60000/60000 [==============================] - 96s 2ms/step - loss: 0.1834 - val_loss: 0.1834
Epoch 9/10
60000/60000 [==============================] - 95s 2ms/step - loss: 0.1829 - val_loss: 0.1835
Epoch 10/10
60000/60000 [==============================] - 103s 2ms/step - loss: 0.1823 - val_loss: 0.1857
<keras.callbacks.History at 0x20bc56c0198>
import matplotlib.pyplot as plt
from scipy.stats import norm
import numpy as np
# 将显示15*15的网格
n = 15
digit_size = 28
figure = np.zeros((digit_size * n,digit_size * n))
grid_x = norm.ppf(np.linspace(0.5,0.95,n))
grid_y = norm.ppf(np.linspace(0.5,0.95,n))
for i,yi in enumerate(grid_x):
for j,xi in enumerate(grid_y):
z_sample = np.array([[xi,yi]])
z_sample = np.tile(z_sample,batch_size).reshape(batch_size,2) # 将z多次重复,以构建一个完整的批量
x_decoded = decoder.predict(z_sample,batch_size = batch_size)
# 将预测后的批量解码为数字
digit = x_decoded[0].reshape(digit_size,digit_size)
figure[i *digit_size:(i+1)*digit_size,
j * digit_size:(j+1)*digit_size] = digit
plt.figure(figsize = (10,10))
plt.imshow(figure,cmap = 'Greys_r')
plt.show()
小结
1.用深度学习进行图像生成,就是通过对潜在空间学习来实现的,这个潜在空间能够捕捉到关于图像数据集的统计信息,通过对潜在空间中的点进行采样和编码,我们可以生成前所未见的图像,这种方法有两个重要工具:变分自编码器和声称是对抗网络
2.VAE得到的是高度结构化的。连续的潜在表示,因此他在潜在空间中进行各种图像编辑的效果也很好,比如换脸,添加微笑等等。它制作基于潜在空间的动画效果也很好,比如沿着潜在空间的一个横截面移动,从而以连续的方式显示从一张起始图像缓慢变化为不同图像的效果
3.GAN可以生成逼真的单幅图像,但得到的潜在空间可能没有良好的结构
生成式对抗网络
1.生成器网络:它以一个随机向量(潜在空间的一个随机点)作为输入,并将其解码为一张合成图像
2.判别器网络:以一张图像(真实的或合成的均可)作为输入,并预测该图像是来自训练集还是生成器网络创建
使用GAN的一些技巧
1.我们使用Ttanh作为生成器中的最后一个**,而不是在其他类型的模型中更常见的sigmoid。
2.我们使用正态分布(高斯分布),而不是均匀分布,从潜在空间采样点。
3.随机性有利于增强鲁棒性。由于GAN的训练结果是一个动态平衡,GAN很可能会被各种方式“卡住”。在培训过程中引入随机性有助于防止这种情况。我们从两个方面引入随机性:1)我们在判别器中使用dropout,2)我们在判别器的标签中添加一些随机噪声。
4.稀疏的梯度会阻碍GAN训练。在深度学习中,稀疏性往往是一种可取的特性,但在GAN中却不是。有两件事可以引起梯度稀疏:1)最大池操作,2)relu**。我们建议使用步进卷积来降低采样,而不是使用最大池,我们建议使用leakyrelu层而不是relu**。它类似于relu,但它通过允许较小的负**值来放松稀疏性约束。
5.在生成的图像中,常见的现象是由于生成器中像素空间覆盖不均而导致“棋盘状伪影”。为了解决这个问题,我们在生成器和鉴别器中使用conv2dtranpose或conv2d时,使用的 一个内核大小,可以被步幅大小整除,
# GAN生成器网络
import keras
from keras import layers
import numpy as np
latent_dim = 32
height = 32
width = 32
channels = 3
generator_input = keras.Input(shape = (latent_dim,))
x = layers.Dense(128 *16 *16)(generator_input)
x = layers.LeakyReLU()(x)
x = layers.Reshape((16,16,128))(x)
x = layers.Conv2D(256,5,padding = 'same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2DTranspose(256,4,strides = 2,padding = 'same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256,5,padding = 'same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(256,5,padding = 'same')(x)
x = layers.LeakyReLU()(x)
x = layers.Conv2D(channels,7,activation = 'tanh',padding = 'same')(x)
generator = keras.models.Model(generator_input,x)
generator.summary()
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) (None, 32) 0
_________________________________________________________________
dense_4 (Dense) (None, 32768) 1081344
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU) (None, 32768) 0
_________________________________________________________________
reshape_3 (Reshape) (None, 16, 16, 128) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 16, 16, 256) 819456
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU) (None, 16, 16, 256) 0
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 32, 32, 256) 1048832
_________________________________________________________________
leaky_re_lu_8 (LeakyReLU) (None, 32, 32, 256) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 32, 32, 256) 1638656
_________________________________________________________________
leaky_re_lu_9 (LeakyReLU) (None, 32, 32, 256) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 256) 1638656
_________________________________________________________________
leaky_re_lu_10 (LeakyReLU) (None, 32, 32, 256) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 32, 32, 3) 37635
=================================================================
Total params: 6,264,579
Trainable params: 6,264,579
Non-trainable params: 0
_________________________________________________________________
# GAN判别器网络
discriminator_input = layers.Input(shape = (height,width,channels))
x = layers.Conv2D(128,3)(discriminator_input)
x = layers.LeakyReLU()(x)
layers.Conv2D(128,4,strides = 2)(x)
x = layers.LeakyReLU()(x)
layers.Conv2D(128,4,strides = 2)(x)
x = layers.LeakyReLU()(x)
layers.Conv2D(128,4,strides = 2)(x)
x = layers.LeakyReLU()(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(1,activation = 'sigmoid')(x)
discriminator =keras.models.Model(discriminator_input,x)
# 优化过程中使用的梯度裁剪和学习率衰减
discriminator_optimizer = keras.optimizers.RMSprop(lr = 0.0008,clipvalue = 1.0,decay = 1e-8)
discriminator.compile(optimizer = discriminator_optimizer,loss = 'binary_crossentropy')
discriminator.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_5 (InputLayer) (None, 32, 32, 3) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 30, 30, 128) 3584
_________________________________________________________________
leaky_re_lu_11 (LeakyReLU) (None, 30, 30, 128) 0
_________________________________________________________________
leaky_re_lu_12 (LeakyReLU) (None, 30, 30, 128) 0
_________________________________________________________________
leaky_re_lu_13 (LeakyReLU) (None, 30, 30, 128) 0
_________________________________________________________________
leaky_re_lu_14 (LeakyReLU) (None, 30, 30, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 115200) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 115200) 0
_________________________________________________________________
dense_5 (Dense) (None, 1) 115201
=================================================================
Total params: 118,785
Trainable params: 118,785
Non-trainable params: 0
_________________________________________________________________
# 对抗网络
# 判别器网络的权重设置为不可训练(仅用于gan模型)
discriminator.trainable = False
gan_input = keras.Input(shape = (latent_dim,))
gan_output = discriminator(generator(gan_input))
gan = keras.models.Model(gan_input,gan_output)
gan_optimizer = keras.optimizers.RMSprop(lr = 0.004,clipvalue = 1.0,decay = 1e-8)
gan.compile(optimizer = gan_optimizer,loss = 'binary_crossentropy')
如何训练DCGAN
1.从潜在空间中抽取随机的点(随机噪声)
2.利用这个随机噪声用generator生成图像
3.将生成图像与真是图像混合
4.使用这些混合后的图像以及相应的标签来训练discriminator
5.在潜在空间中随机抽取新的点
6.使用这些随机向量以及全部是"真是图像"的标签来训练gan,这会更新生成器的权重,其更新方向是使得判别器能够将生成图像预测为真实图像.这个过程是训练生成器去欺骗判别器
# 实现GAN的训练
import os
from keras.preprocessing import image
(x_train,y_train),(_,_) = keras.datasets.cifar10.load_data()
# flatten函数,将矩阵降维,展开成一行
x_train = x_train[y_train.flatten() == 6]
x_train = x_train.reshape((x_train.shape[0],) + (height,width,channels)).astype('float32') / 255.
iterations = 10000
batch_size = 20
save_dir = r'D:\study\Python\Deeplearning\Untitled Folder'
start = 0
for step in range(iterations):
# 在潜在空间中采样随机点
random_latent_vectors = np.random.normal(size = (batch_size,latent_dim))
# 将采样得到的点生成为图像
generated_images = generator.predict(random_latent_vectors)
# 将真数据和假数据合在一起
stop = start + batch_size
real_images = x_train[start,stop]
combined_images = np.concatenate([generated_images,real_images])
# 生成真实数据和假数据的标签,并向标签中加入噪声
labels = np.concatenate([np.ones((batch_size,1)),np.zeros((batch_size,1))])
labels += 0.05 * np.random.random(labels.shape)
# 判别器的损失
d_loss = discriminator.train_on_batch(combined_images,labels)
# 在潜在空间中采样随机点
random_latent_vectors = np.random.normal(size = (batch_size,latent_dim))
# 合并标签,全部是真是图像(这是在撒谎)
misleading_targets = np.zeros((batch_size,1))
# 通过gan来训练生成器
a_loss = gan.train_on_batch(random_latent_vectors,misleading_targets)
start += batch_size
if start > len(x_train) - batch_size:
start = 0
# 每100步保存并绘图
if step % 100 == 0:
gan.save_weights('gan.h5')
print('d_loss',d_loss)
print('a_loss',a_loss)
img = image.array_to_image(generated_images[0]*255,scale = False)
img.save(os.path.join(save_dir,'generated_frog' + str(step) + '.png'))
img = image.array_to_image(real_images[0]*255,scale = False)
img.save(os.path.join(save_dir,'real_frog' + str(step) + '.png'))
小结
1.GAN由一个生成器网络和一个判别器网络组成,判别器的训练目的是能够区分生成器的输出来自训练集的真是图像,生成器的训练目的是欺骗判别器.值得注意的是,生成器从未直接见过训练集中的图像,他所知道的关于数据的信息都来自于判别器
2.GAN很难训练,因为训练GAN是一个动态过程,而不是具有固定损失的简单梯度下降过程.想要正确的训练GAN,需要使用一些启发式技巧,还需要大量的调节
3.GAN可能会生成非常逼真的图像,但与VAE不同,GAN学习的潜在空间没有整齐的连续结构,因此可能不适用于某些实际应用,比如通过潜在空间概念向量进行图像编辑
本章总结
1.如何生成序列数据,每次生成一个时间步,这可以应用于文本生成,也可以应用于逐个音符的音乐生成或其他任何类型的时间序列数据
2.DeepDream的工作原理:通过输入空间中的梯度上升将卷积神经网络的所有曾**最大化,
3.如何实现风格迁移,及将内容图像与风格图像组合在一起,并产生有趣的结果
4.什么是生成式对抗网络(GAN),什么是变分自编码器(VAE),他们如何应用于创造新的图像,以及如何使用潜在空间概念向量进行图像编辑