TensorFlow小程序(四):MNIST数字识别(神经网络的优化)

从不同的方面对神经网络进行简单优化

1.代价函数

上一篇中我们用的是简单的二次函数(均方误差)作为代价函数

loss=tf.reduce_mean(tf.square(y-prediction)) #求预测的平均误差

这一节中我们进行了改进,采用交叉熵作为代价函数。交叉熵刻画了两个概率分布之间的距离,常用在分类问题中使用。交叉熵一般会与softmax回归一起使用
softmax回归本身可以作为一个学习算法来优化分离结果,但在Tensorflow中,softmax回归的参数被去掉了,它只是一层额外的处理层,将神经网络的输出编程一个概率分布。

loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets("MNIST_data",one_hot=True)  #载入MINIST数据集,MINIST_data表示当前目录下的这个文件夹,也可以改为别的路径
batch_size=100 #每个批次大小
n_batch=mnist.train.num_examples//batch_size   #批次数目
x=tf.placeholder(tf.float32,[None,784])  #输入,输出占位符 28*28=784维
y=tf.placeholder(tf.float32,[None,10])  #输出一共为10个数字 y为标签
#创建一个简单的神经网络
W=tf.Variable(tf.zeros([784,10]),name='W')  #输入784,输出10
b=tf.Variable(tf.zeros([10]),name='b')
wx_plus_b=tf.matmul(x,W)+b
prediction=tf.nn.softmax(wx_plus_b) #将数值转换成预测概率
#交叉熵代价函数
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#loss=tf.reduce_mean(tf.square(y-prediction))  #求预测的平均误差  二次代价函数
#使用梯度下降法
train_step=tf.train.GradientDescentOptimizer(0.3).minimize(loss)   #梯度下降法是误差减小,学习率为0.3
init=tf.global_variables_initializer()  #变量初始化
correct_prediction=tf.equal(tf.arg_max(y,1),tf.arg_max(prediction,1))  #arg_max返回一维张量中最大值所在的位置,将预测值与真实值用equal进行是否相等判断,结果为True或False
              #y是真实值  结果存放在布尔型列表中
accuary=tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) #cast将correct_prediction中布尔型转化为float型,并求平均值,计算正确度
with tf.Session()as sess:
    sess.run(init)
    for epoch in range(20):  #共训练20次 所有的图片训练20次
        for batch in range(n_batch):   #训练一次,进行的次数
            batch_x,batch_y=mnist.train.next_batch(batch_size) #batch_size表示100张图片,取出待训练数字batch_x为图像,batch_y为标签
            sess.run(train_step,feed_dict={x:batch_x,y:batch_y})  #进行训练
        acc=sess.run(accuary,feed_dict={x:mnist.test.images,y:mnist.test.labels})  #计算准确率
        print("Iter"+str(epoch)+",Testing acc"+str(acc))

运行结果:
Iter0,Testing acc0.831
Iter1,Testing acc0.8994
Iter2,Testing acc0.9058
Iter3,Testing acc0.9108
Iter4,Testing acc0.9124
Iter5,Testing acc0.9139
Iter6,Testing acc0.9174
Iter7,Testing acc0.9186
Iter8,Testing acc0.9181
Iter9,Testing acc0.9194
Iter10,Testing acc0.92
Iter11,Testing acc0.921
Iter12,Testing acc0.9206
Iter13,Testing acc0.9221
Iter14,Testing acc0.9222
Iter15,Testing acc0.9228
Iter16,Testing acc0.9227
Iter17,Testing acc0.9231
Iter18,Testing acc0.9233
Iter19,Testing acc0.9233
结果与上一篇中的结果相比,精确度有了一些提高,但是提高的并不多。

2.优化器

深度学习的目标是通过不断改变网络参数,使得参数能够对输入做各种非线性变换拟合输出,本质上就是一个函数去寻找最优解,所以如何去更新参数是深度学习研究的重点。通常将更新参数的算法称为优化器,字面理解就是通过什么算法去优化网络模型的参数。常用的优化器就是梯度下降。
train_step=tf.train.GradientDescentOptimizer(0.3).minimize(loss)
本次程序中采用AdamOptimizer进行优化:train_step=tf.train.AdamOptimizer(0.01).minimize(loss)
具体代码没有太大的变动:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist=input_data.read_data_sets("MNIST_data",one_hot=True)  #载入MINIST数据集,MINIST_data表示当前目录下的这个文件夹,也可以改为别的路径
batch_size=100 #每个批次大小
n_batch=mnist.train.num_examples//batch_size   #批次数目
x=tf.placeholder(tf.float32,[None,784])  #输入,输出占位符 28*28=784维
y=tf.placeholder(tf.float32,[None,10])   #输出一共为10个数字 y为标签
#创建一个简单的神经网络
w=tf.Variable(tf.zeros([784,10]))  #输入784,输出10
b=tf.Variable(tf.zeros([10]))
prediction=tf.nn.softmax(tf.matmul(x,w)+b) #将数值转换成预测概率
#二次代价函数
loss=tf.reduce_mean(tf.square(y-prediction))  #求预测的平均误差
#交叉熵代价函数
#loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#使用梯度下降法
#train_step=tf.train.GradientDescentOptimizer(0.3).minimize(loss)   #梯度下降法是误差减小,学习率为0.3
train_step=tf.train.AdamOptimizer(0.01).minimize(loss)
init=tf.global_variables_initializer()  #变量初始化
correct_prediction=tf.equal(tf.arg_max(y,1),tf.arg_max(prediction,1))  #arg_max返回一维张量中最大值所在的位置,将预测值与真实值用equal进行是否相等判断,结果为True或False
              #y是真实值  结果存放在布尔型列表中
accuary=tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) #cast将correct_prediction中布尔型转化为float型,并求平均值,计算正确度
with tf.Session()as sess:
    sess.run(init)
    for epoch in range(20):  #共训练20次 所有的图片训练20次
        for batch in range(n_batch):   #训练一次,进行的次数
            batch_x,batch_y=mnist.train.next_batch(batch_size) #batch_size表示100张图片,取出待训练数字batch_x为图像,batch_y为标签
            sess.run(train_step,feed_dict={x:batch_x,y:batch_y})  #进行训练
        acc=sess.run(accuary,feed_dict={x:mnist.test.images,y:mnist.test.labels})  #计算准确率
        print("Iter"+str(epoch)+",Testing acc"+str(acc))

运行结果如下:
AdamOptimizer优化器运行结果:(二次代价函数)
Iter0,Testing acc0.9204
Iter1,Testing acc0.9249
Iter2,Testing acc0.9255
Iter3,Testing acc0.9272
Iter4,Testing acc0.9309
Iter5,Testing acc0.9262
Iter6,Testing acc0.928
Iter7,Testing acc0.9308
Iter8,Testing acc0.9306
Iter9,Testing acc0.927
Iter10,Testing acc0.9274
Iter11,Testing acc0.93
Iter12,Testing acc0.929
Iter13,Testing acc0.9307
Iter14,Testing acc0.9305
Iter15,Testing acc0.9264
Iter16,Testing acc0.9299
Iter17,Testing acc0.9306
Iter18,Testing acc0.9264
Iter19,Testing acc0.9245
与上一篇结果相比较,准确率也有了一定的提升。
优化器的介绍参考:https://blog.csdn.net/jinxiaonian11/article/details/83141916

3.拟合问题

神经网络中模型训练的三种情况:
TensorFlow小程序(四):MNIST数字识别(神经网络的优化)防止过拟合的方法:增加数据集、正则化方法、Dropout。
Dropout基本原理就是让部分神经元工作,部分神经元不工作。
本节中采用Dropout的方法进行优化,代码如下:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist=input_data.read_data_sets("MNIST_data",one_hot=True)  #载入MINIST数据集,MINIST_data表示当前目录下的这个文件夹,也可以改为别的路径
batch_size=100 #每个批次大小
n_batch=mnist.train.num_examples//batch_size   #批次数目

x=tf.placeholder(tf.float32,[None,784])  #输入,输出占位符 28*28=784维
y=tf.placeholder(tf.float32,[None,10])   #输出一共为10个数字 y为标签
keep_prob=tf.placeholder(tf.float32)

#创建一个稍微复杂的神经网络
W1 = tf.Variable(tf.truncated_normal([784,2000],stddev=0.1))
b1 = tf.Variable(tf.zeros([2000])+0.1)
L1 = tf.nn.tanh(tf.matmul(x,W1)+b1)
L1_drop = tf.nn.dropout(L1,keep_prob)

#隐藏层1
W2 = tf.Variable(tf.truncated_normal([2000,2000],stddev=0.1))
b2 = tf.Variable(tf.zeros([2000])+0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop,W2)+b2)
L2_drop = tf.nn.dropout(L2,keep_prob)

#隐藏层2
W3 = tf.Variable(tf.truncated_normal([2000,1000],stddev=0.1))
b3 = tf.Variable(tf.zeros([1000])+0.1)
L3 = tf.nn.tanh(tf.matmul(L2_drop,W3)+b3)
L3_drop = tf.nn.dropout(L3,keep_prob)

W4 = tf.Variable(tf.truncated_normal([1000,10],stddev=0.1))
b4 = tf.Variable(tf.zeros([10])+0.1)
prediction = tf.nn.softmax(tf.matmul(L3_drop,W4)+b4)

#二次代价函数
#loss=tf.reduce_mean(tf.square(y-prediction))  #求预测的平均误差
#交叉熵代价函数
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#使用梯度下降法
train_step=tf.train.GradientDescentOptimizer(0.3).minimize(loss)   #梯度下降法是误差减小,学习率为0.3
init=tf.global_variables_initializer()  #变量初始化
correct_prediction=tf.equal(tf.arg_max(y,1),tf.arg_max(prediction,1))  #arg_max返回一维张量中最大值所在的位置,将预测值与真实值用equal进行是否相等判断,结果为True或False
              #y是真实值  结果存放在布尔型列表中
accuary=tf.reduce_mean(tf.cast(correct_prediction,tf.float32)) #cast将correct_prediction中布尔型转化为float型,并求平均值,计算正确度
with tf.Session()as sess:
    sess.run(init)
    for epoch in range(1): #共训练20次 所有的图片训练20次
        for batch in range(n_batch):   #训练一次,进行的次数
            batch_x,batch_y=mnist.train.next_batch(batch_size) #batch_size表示100张图片,取出待训练数字batch_x为图像,batch_y为标签
            sess.run(train_step,feed_dict={x:batch_x,y:batch_y,keep_prob:0.7}) #1.0表示所有神经元都是正常工作的
            
        acc1=sess.run(accuary,feed_dict={x:mnist.test.images,y:mnist.test.labels,keep_prob:0.7}) #计算准确率
        acc2=sess.run(accuary,feed_dict={x:mnist.train.images,y:mnist.train.labels,keep_prob:0.7})                
        print("Iter"+str(epoch)+",Testing acc"+str(acc1)+",Training acc"+str(acc2))

运行结果如下:
Iter0,Testing acc0.8917,Training acc0.88763636
Iter1,Testing acc0.9107,Training acc0.90636367
Iter2,Testing acc0.9155,Training acc0.9148909
Iter3,Testing acc0.9261,Training acc0.9232909
Iter4,Testing acc0.9314,Training acc0.9303273
Iter5,Testing acc0.9322,Training acc0.93305457
Iter6,Testing acc0.9359,Training acc0.9376909
Iter7,Testing acc0.9399,Training acc0.9396
Iter8,Testing acc0.9389,Training acc0.94134545
Iter9,Testing acc0.9425,Training acc0.94512725
Iter10,Testing acc0.9427,Training acc0.9465273
Iter11,Testing acc0.9449,Training acc0.9489091
Iter12,Testing acc0.945,Training acc0.95016366
Iter13,Testing acc0.9472,Training acc0.9520182
Iter14,Testing acc0.9494,Training acc0.95263636
Iter15,Testing acc0.9495,Training acc0.95427275
Iter16,Testing acc0.9551,Training acc0.9572545
Iter17,Testing acc0.9524,Training acc0.95754546
Iter18,Testing acc0.9528,Training acc0.9580182
Iter19,Testing acc0.9547,Training acc0.96041816
Iter20,Testing acc0.9568,Training acc0.9604909

代码中选择了一个稍微复杂一些的神经网络,对于简单问题其实没有必要用复杂的神经网络,这样可能会使结果并不是很好。但对比前边的结果,采用了Dropout的方法使准确率还是有了很大的提升。
4.学习率
设置学习率可以控制参数的更新速度,决定了参数每次的更新速度。如果学习率过大时,无论进行多少次迭代,可能会使参数在极优值的两侧来回移动,不会收敛到一个极小值。当学习率过小时,虽然能保证收敛性,但是这会大大降低优化速度。Tensorflow中常用指数衰减的方法来设置学习率。

总结:综合上述的一些方法,对代码进行了更好的一些优化,代码如下:

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#载入数据集
mnist=input_data.read_data_sets("MNIST_data",one_hot=True)

#每个批次的大小
batch_size=100
#计算一共多少个批次
n_batch=mnist.train.num_examples//batch_size

#定义两个placeholder
x=tf.placeholder(tf.float32,[None,784])
y=tf.placeholder(tf.float32,[None,10])
keep_prob=tf.placeholder(tf.float32)
lr=tf.Variable(0.001,dtype=tf.float32)   #初始学习率为0.001,在后边训练过程中会改变大小

#创建一个简单的神经网络
W1=tf.Variable(tf.truncated_normal([784,500],stddev=0.1))  #正态分布,标准差为0.1
b1=tf.Variable(tf.zeros([500])+0.1)
L1=tf.nn.tanh(tf.matmul(x,W1)+b1)
L1_drop=tf.nn.dropout(L1,keep_prob)   

W2=tf.Variable(tf.truncated_normal([500,300],stddev=0.1))
b2=tf.Variable(tf.zeros([300])+0.1)
L2=tf.nn.tanh(tf.matmul(L1_drop,W2)+b2)
L2_drop=tf.nn.dropout(L2,keep_prob)

W3=tf.Variable(tf.truncated_normal([300,10],stddev=0.1))
b3=tf.Variable(tf.zeros([10])+0.1)
prediction=tf.nn.softmax(tf.matmul(L2_drop,W3)+b3)

#交叉熵代价函数
loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#训练
train_step=tf.train.AdamOptimizer(lr).minimize(loss)    #优化器对lr进行优化

#初始化变量
init=tf.global_variables_initializer()

#训练结果存放在一个布尔型列表中
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
#求准确率
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session()as sess:
    sess.run(init)
    for epoch in range(31):
        sess.run(tf.assign(lr,0.001*(0.95**epoch)))    #改变学习率,0.95的n次方进行指数衰减
        for batch in range(n_batch):
            batch_xs,batch_ys=mnist.train.next_batch(batch_size)
            sess.run(train_step,feed_dict={x:batch_xs,y:batch_ys,keep_prob:1.0})
            
        learning_rate=sess.run(lr)
        acc=sess.run(accuracy,feed_dict={x:mnist.test.images,y:mnist.test.labels,keep_prob:1.0})
        print("Iter"+str(epoch)+",Testing Accuracy="+str(acc)+",Learning Rate="+str(learning_rate))

运行结果:

#Iter0,Testing Accuracy=0.9497,Learning Rate=0.001
#Iter1,Testing Accuracy=0.9597,Learning Rate=0.00095
#Iter2,Testing Accuracy=0.9686,Learning Rate=0.0009025
#Iter3,Testing Accuracy=0.9667,Learning Rate=0.000857375
#Iter4,Testing Accuracy=0.974,Learning Rate=0.00081450626
#Iter5,Testing Accuracy=0.9751,Learning Rate=0.0007737809
#Iter6,Testing Accuracy=0.9767,Learning Rate=0.0007350919
#Iter7,Testing Accuracy=0.9756,Learning Rate=0.0006983373
#Iter8,Testing Accuracy=0.9752,Learning Rate=0.0006634204
#Iter9,Testing Accuracy=0.9777,Learning Rate=0.0006302494
#Iter10,Testing Accuracy=0.9787,Learning Rate=0.0005987369
#Iter11,Testing Accuracy=0.9794,Learning Rate=0.0005688001
#Iter12,Testing Accuracy=0.9796,Learning Rate=0.0005403601
#Iter13,Testing Accuracy=0.9792,Learning Rate=0.0005133421
#Iter14,Testing Accuracy=0.9779,Learning Rate=0.000487675
#Iter15,Testing Accuracy=0.9798,Learning Rate=0.00046329122
#Iter16,Testing Accuracy=0.9795,Learning Rate=0.00044012666
#Iter17,Testing Accuracy=0.9809,Learning Rate=0.00041812033
#Iter18,Testing Accuracy=0.9807,Learning Rate=0.00039721432
#Iter19,Testing Accuracy=0.9811,Learning Rate=0.0003773536
#Iter20,Testing Accuracy=0.9822,Learning Rate=0.00035848594
#Iter21,Testing Accuracy=0.9813,Learning Rate=0.00034056162
#Iter22,Testing Accuracy=0.9821,Learning Rate=0.00032353355
#Iter23,Testing Accuracy=0.9819,Learning Rate=0.00030735688
#Iter24,Testing Accuracy=0.9817,Learning Rate=0.000291989
#Iter25,Testing Accuracy=0.9814,Learning Rate=0.00027738957
#Iter26,Testing Accuracy=0.9798,Learning Rate=0.0002635201
#Iter27,Testing Accuracy=0.9821,Learning Rate=0.00025034408
#Iter28,Testing Accuracy=0.9828,Learning Rate=0.00023782688
#Iter29,Testing Accuracy=0.9822,Learning Rate=0.00022593554
#Iter30,Testing Accuracy=0.9817,Learning Rate=0.00021463877
我迭代了31次,最后准确率达到了98%左右,可以再进行多次迭代,对比结果。
代码来自视频:https://www.bilibili.com/video/av20542427/?p=15