简单明了的LSTM/GRU应用实例(Tensorflow版)
版权声明:本文为博主原创文章,未经博主允许不得转载。 //blog.****.net/baimafujinji/article/details/78279744
本文用一个简单的分类(Classification)任务来演示在Tensorflow中使用RNN的基本技巧。更确切地说是使用LSTM(你可以认为它是一种特殊的RNN)。鉴于本文仅属于入门级教程,因此你可以不必太关心LSTM的内部结构。但是对于RNN的基本认知仍然是必要的。这个例子的源码主要来自文献【1】(笔者有修改),而【1】的作者又是参考的文献【2】。
如果你感觉自己对RNN仍然一窍不通,那么你可以参考本系列博文之前的一篇文章《传说中的RNN到底是何方神圣?》 。注意当我们说RNN的时候其实指的就是LSTM(下面给出代码中我们也提供了GRU的使用范例,参考被注释掉的部分,因为在TensorFlow中只要换一个函数就可以了)。
当然,对于Tensorflow的基本使用也是必备的,例如,你应该知道place holder、session这些东西都是什么,以及还如何正确地使用它们。同样,如果你对这些内容仍然一窍不通,那么你可以参考本系列博文之前的一篇文章《TensorFlow简明入门宝典 》。
作为例子,我们这里要完成的任务是对0~9这十个手写数字进行分类。所使用的数据集为著名的MINST,关于这个数据集的介绍,以及关于数据读入部分的解读,可以参考之前的文章《基于Softmax实现手写数字识别 》。
首先要做的事情当然是引入TensorFlow,并读入数据(注意,这里每个输入的图像的都是一个28x28的矩阵(如下图所示),所以也可以看成是28x28=784维的一个向量。):
- import tensorflow as tf
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
这里插入一段,实际执行时可以跳过:如果你想看看这些handwriting的数字到底长什么样,那么可以执行下面的代码:
- import matplotlib.pyplot as plt
- mnist = input_data.read_data_sets('MNIST_data')
- examples_n = 100 # display some images
- indexes = np.random.choice(range(mnist.train.images.shape[0]), examples_n, replace=False)
- fig = plt.figure(figsize=(5,5))
- for i in range(1, examples_n + 1):
- a = fig.add_subplot(np.sqrt(examples_n), np.sqrt(examples_n), i)
- a.axis('off')
- image = mnist.train.images[indexes[i-1]].reshape((28, 28))
- a.imshow(image, cmap='Greys_r');
- plt.show()
然后,程序需要设定一些(和我们打算要设定之NN相关的)参数
- lr = 0.001 #learning rate,用于梯度下降
- training_iters = 100000 #训练的循环次数
- batch_size = 128 #随机梯度下降时,每次选择batch的大小
- n_inputs = 28 # MNIST data input (img shape: 28*28)
- n_steps = 28 # time steps
- n_hidden_units = 128 # neurons in hidden layer, also <strong><code>t</code></strong>he number of units in the LSTM cell.
- n_classes = 10 # MNIST classes (0-9 digits)
接下来,来构建TensorFlow图的输入(也就是神经网络的输入)。其中,tf.float32 表示成员类型float32, [None, n_steps, n_inputs] 是tensor的shape, None表示第一维是任意数量,然后n_steps表示第二维是n_steps维,n_inputs表示第三个维度是n_inputs维。
- # tf Graph input
- x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
- y = tf.placeholder(tf.float32, [None, n_classes])
下图是整个RNN的基本架构。每次从train_set中选128(batch_size)个图像做随机梯度下降。因为在输入和输出时都加上了一层隐藏层,可以理解为全连接的神经网络层,所以要在这两个隐藏层的前后都做reshape。隐藏层中的unit一共有128个(这个数字你也可以取其他值,对当前这个例子来说关系不大)。中间最核心的RNN cell一共要“循环”28次,所以如果你把cells拉开,就好像一个长度为28的链式结构。这个过程其实相当于是每个cell“负责看”一副图像的一行(一共有28行),然后之前各行的记忆会对后续一些行产生影响。
接下来定义两个隐藏层中需要使用的权重矩阵W和偏置向量b,例如上图所示X经过第一次reshape之后shape变为3584×28,所以第一个(输入阶段)的隐藏层权重矩阵的shape为28×128,矩阵相乘之后得到的shape为3584×128。然后再做一次reshape,X_in的shape变成128×28×128,即128 batches,28 steps 和 128 hidden-units。
- # Define weights
- weights = {
- # (28, 128)
- 'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),
- # (128, 10)
- 'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))
- }
- biases = {
- # (128, )
- 'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),
- # (10, )
- 'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))
- }
接下来我们就按照上面的思路来定义RNN的结构,它包含一个输入阶段的隐藏层,LSTM Cell和输出阶段的隐藏层:
- def RNN(X, weights, biases):
- # hidden layer for input to cell
- ########################################
- # transpose the inputs shape from
- # X ==> (128 batch * 28 steps, 28 inputs)
- # for reshape, one shape dimension can be -1.
- # In this case, the value is inferred from the length of
- # the array and remaining dimensions.
- X = tf.reshape(X, [-1, n_inputs])
- # into hidden
- # X_in = (128 batch * 28 steps, 128 hidden)
- X_in = tf.matmul(X, weights['in']) + biases['in']
- # X_in ==> (128 batch, 28 steps, 128 hidden)
- X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])
- # cell
- ##########################################
- #The implementation of Basic LSTM recurrent network cell.
- #We add forget_bias (default: 1) to the biases of the forget gate
- #in order to reduce the scale of forgetting in the beginning of the training.
- #It does not allow cell clipping, a projection layer,
- #and does not use peep-hole connections: it is the basic baseline.
- #For advanced models, please use the full tf.nn.rnn_cell.LSTMCell.
- cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units) #<strong><code>num_units</code></strong>: int, The number of units in the LSTM cell.
- #cell = tf.contrib.rnn.GRUCell(n_hidden_units)
- # lstm cell is divided into two parts (c_state, h_state)
- # for the basic RNN, its states is only h_state
- # zero_state returns zero-filled state tensor(s).
- init_state = cell.zero_state(batch_size, dtype=tf.float32)
- # You have 2 options for following step.
- # 1: tf.nn.rnn(cell, inputs);
- # 2: tf.nn.dynamic_rnn(cell, inputs).
- # If use option 1, you have to modify the shape of X_in, go and check out [2].
- # In here, we go for option 2. (which is recommended.)
- # dynamic_rnn receive Tensor (batch, steps, inputs) or (steps, batch, inputs) as X_in.
- # Make sure the time_major is changed accordingly.
- # outputs is a list which stores all outputs for each step, final_state is the output of the last one state
- outputs, final_state = tf.nn.dynamic_rnn(cell, X_in, initial_state=init_state, time_major=False)
- # hidden layer for output as the final results
- #############################################
- # results = tf.matmul(final_state[1], weights['out']) + biases['out']
- # # or
- # unpack to list [(batch, outputs)..] * steps
- # Analogously to tf.pack and tf.unpack,
- # we're renamed TensorArray.pack and TensorArray.unpack
- # to TensorArray.stack and TensorArray.unstack.
- outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
- results = tf.matmul(outputs[-1], weights['out']) + biases['out'] # shape = (128, 10)
- return results
然后跟一般的深度神经网络的训练方法差不多,定义loss function和最优化求解方法(例如梯度详解等等,这里我们使用的优化器是AdamOptimizer(这部分跟《基于Softmax实现手写数字识别 》中基本一致):
- pred = RNN(x, weights, biases)
- cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
- train_op = tf.train.AdamOptimizer(lr).minimize(cost)
后面程序执行的时候,我们会计算预测的accuracy并输入,所以这里给出accuracy的定义:
- correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
- accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
- # Initialize the variables (i.e. assign their default value)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- step = 0
- while step * batch_size < training_iters:
- batch_xs, batch_ys = mnist.train.next_batch(batch_size)
- batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])
- sess.run([train_op], feed_dict={
- x: batch_xs,
- y: batch_ys,
- })
- if step % 20 == 0:
- print(sess.run(accuracy, feed_dict={
- x: batch_xs,
- y: batch_ys,
- }))
- step += 1
执行一下,来看看效果,基本上准确率是曲折上升的,而且基本可以保持在95%以上的高位。需要说明的是,这个例子我们是在training set中计算的准确率,更好的实践还是用test set来评估模型的效果。
参考文献
【2】https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py
(本文完)
本文用一个简单的分类(Classification)任务来演示在Tensorflow中使用RNN的基本技巧。更确切地说是使用LSTM(你可以认为它是一种特殊的RNN)。鉴于本文仅属于入门级教程,因此你可以不必太关心LSTM的内部结构。但是对于RNN的基本认知仍然是必要的。这个例子的源码主要来自文献【1】(笔者有修改),而【1】的作者又是参考的文献【2】。
如果你感觉自己对RNN仍然一窍不通,那么你可以参考本系列博文之前的一篇文章《传说中的RNN到底是何方神圣?》 。注意当我们说RNN的时候其实指的就是LSTM(下面给出代码中我们也提供了GRU的使用范例,参考被注释掉的部分,因为在TensorFlow中只要换一个函数就可以了)。
当然,对于Tensorflow的基本使用也是必备的,例如,你应该知道place holder、session这些东西都是什么,以及还如何正确地使用它们。同样,如果你对这些内容仍然一窍不通,那么你可以参考本系列博文之前的一篇文章《TensorFlow简明入门宝典 》。
作为例子,我们这里要完成的任务是对0~9这十个手写数字进行分类。所使用的数据集为著名的MINST,关于这个数据集的介绍,以及关于数据读入部分的解读,可以参考之前的文章《基于Softmax实现手写数字识别 》。
首先要做的事情当然是引入TensorFlow,并读入数据(注意,这里每个输入的图像的都是一个28x28的矩阵(如下图所示),所以也可以看成是28x28=784维的一个向量。):
- import tensorflow as tf
- from tensorflow.examples.tutorials.mnist import input_data
- mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
这里插入一段,实际执行时可以跳过:如果你想看看这些handwriting的数字到底长什么样,那么可以执行下面的代码:
- import matplotlib.pyplot as plt
- mnist = input_data.read_data_sets('MNIST_data')
- examples_n = 100 # display some images
- indexes = np.random.choice(range(mnist.train.images.shape[0]), examples_n, replace=False)
- fig = plt.figure(figsize=(5,5))
- for i in range(1, examples_n + 1):
- a = fig.add_subplot(np.sqrt(examples_n), np.sqrt(examples_n), i)
- a.axis('off')
- image = mnist.train.images[indexes[i-1]].reshape((28, 28))
- a.imshow(image, cmap='Greys_r');
- plt.show()
然后,程序需要设定一些(和我们打算要设定之NN相关的)参数
- lr = 0.001 #learning rate,用于梯度下降
- training_iters = 100000 #训练的循环次数
- batch_size = 128 #随机梯度下降时,每次选择batch的大小
- n_inputs = 28 # MNIST data input (img shape: 28*28)
- n_steps = 28 # time steps
- n_hidden_units = 128 # neurons in hidden layer, also <strong><code>t</code></strong>he number of units in the LSTM cell.
- n_classes = 10 # MNIST classes (0-9 digits)
接下来,来构建TensorFlow图的输入(也就是神经网络的输入)。其中,tf.float32 表示成员类型float32, [None, n_steps, n_inputs] 是tensor的shape, None表示第一维是任意数量,然后n_steps表示第二维是n_steps维,n_inputs表示第三个维度是n_inputs维。
- # tf Graph input
- x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
- y = tf.placeholder(tf.float32, [None, n_classes])
下图是整个RNN的基本架构。每次从train_set中选128(batch_size)个图像做随机梯度下降。因为在输入和输出时都加上了一层隐藏层,可以理解为全连接的神经网络层,所以要在这两个隐藏层的前后都做reshape。隐藏层中的unit一共有128个(这个数字你也可以取其他值,对当前这个例子来说关系不大)。中间最核心的RNN cell一共要“循环”28次,所以如果你把cells拉开,就好像一个长度为28的链式结构。这个过程其实相当于是每个cell“负责看”一副图像的一行(一共有28行),然后之前各行的记忆会对后续一些行产生影响。
接下来定义两个隐藏层中需要使用的权重矩阵W和偏置向量b,例如上图所示X经过第一次reshape之后shape变为3584×28,所以第一个(输入阶段)的隐藏层权重矩阵的shape为28×128,矩阵相乘之后得到的shape为3584×128。然后再做一次reshape,X_in的shape变成128×28×128,即128 batches,28 steps 和 128 hidden-units。
- # Define weights
- weights = {
- # (28, 128)
- 'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),
- # (128, 10)
- 'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))
- }
- biases = {
- # (128, )
- 'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),
- # (10, )
- 'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))
- }
接下来我们就按照上面的思路来定义RNN的结构,它包含一个输入阶段的隐藏层,LSTM Cell和输出阶段的隐藏层:
- def RNN(X, weights, biases):
- # hidden layer for input to cell
- ########################################
- # transpose the inputs shape from
- # X ==> (128 batch * 28 steps, 28 inputs)
- # for reshape, one shape dimension can be -1.
- # In this case, the value is inferred from the length of
- # the array and remaining dimensions.
- X = tf.reshape(X, [-1, n_inputs])
- # into hidden
- # X_in = (128 batch * 28 steps, 128 hidden)
- X_in = tf.matmul(X, weights['in']) + biases['in']
- # X_in ==> (128 batch, 28 steps, 128 hidden)
- X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])
- # cell
- ##########################################
- #The implementation of Basic LSTM recurrent network cell.
- #We add forget_bias (default: 1) to the biases of the forget gate
- #in order to reduce the scale of forgetting in the beginning of the training.
- #It does not allow cell clipping, a projection layer,
- #and does not use peep-hole connections: it is the basic baseline.
- #For advanced models, please use the full tf.nn.rnn_cell.LSTMCell.
- cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units) #<strong><code>num_units</code></strong>: int, The number of units in the LSTM cell.
- #cell = tf.contrib.rnn.GRUCell(n_hidden_units)
- # lstm cell is divided into two parts (c_state, h_state)
- # for the basic RNN, its states is only h_state
- # zero_state returns zero-filled state tensor(s).
- init_state = cell.zero_state(batch_size, dtype=tf.float32)
- # You have 2 options for following step.
- # 1: tf.nn.rnn(cell, inputs);
- # 2: tf.nn.dynamic_rnn(cell, inputs).
- # If use option 1, you have to modify the shape of X_in, go and check out [2].
- # In here, we go for option 2. (which is recommended.)
- # dynamic_rnn receive Tensor (batch, steps, inputs) or (steps, batch, inputs) as X_in.
- # Make sure the time_major is changed accordingly.
- # outputs is a list which stores all outputs for each step, final_state is the output of the last one state
- outputs, final_state = tf.nn.dynamic_rnn(cell, X_in, initial_state=init_state, time_major=False)
- # hidden layer for output as the final results
- #############################################
- # results = tf.matmul(final_state[1], weights['out']) + biases['out']
- # # or
- # unpack to list [(batch, outputs)..] * steps
- # Analogously to tf.pack and tf.unpack,
- # we're renamed TensorArray.pack and TensorArray.unpack
- # to TensorArray.stack and TensorArray.unstack.
- outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))
- results = tf.matmul(outputs[-1], weights['out']) + biases['out'] # shape = (128, 10)
- return results
然后跟一般的深度神经网络的训练方法差不多,定义loss function和最优化求解方法(例如梯度详解等等,这里我们使用的优化器是AdamOptimizer(这部分跟《基于Softmax实现手写数字识别 》中基本一致):
- pred = RNN(x, weights, biases)
- cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
- train_op = tf.train.AdamOptimizer(lr).minimize(cost)
后面程序执行的时候,我们会计算预测的accuracy并输入,所以这里给出accuracy的定义:
- correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
- accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
- # Initialize the variables (i.e. assign their default value)
- init = tf.global_variables_initializer()
- with tf.Session() as sess:
- sess.run(init)
- step = 0
- while step * batch_size < training_iters:
- batch_xs, batch_ys = mnist.train.next_batch(batch_size)
- batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])
- sess.run([train_op], feed_dict={
- x: batch_xs,
- y: batch_ys,
- })
- if step % 20 == 0:
- print(sess.run(accuracy, feed_dict={
- x: batch_xs,
- y: batch_ys,
- }))
- step += 1
执行一下,来看看效果,基本上准确率是曲折上升的,而且基本可以保持在95%以上的高位。需要说明的是,这个例子我们是在training set中计算的准确率,更好的实践还是用test set来评估模型的效果。
参考文献
【2】https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py
(本文完)