简单明了的LSTM/GRU应用实例（Tensorflow版）

2017年10月19日 04:03:22 1982人阅读评论(1) 收藏举报

分类：

深度学习与TensorFlow（11）简单明了的LSTM/GRU应用实例（Tensorflow版）

本文用一个简单的分类（Classification）任务来演示在Tensorflow中使用RNN的基本技巧。更确切地说是使用LSTM（你可以认为它是一种特殊的RNN）。鉴于本文仅属于入门级教程，因此你可以不必太关心LSTM的内部结构。但是对于RNN的基本认知仍然是必要的。这个例子的源码主要来自文献【1】（笔者有修改），而【1】的作者又是参考的文献【2】。

如果你感觉自己对RNN仍然一窍不通，那么你可以参考本系列博文之前的一篇文章《传说中的RNN到底是何方神圣？》。注意当我们说RNN的时候其实指的就是LSTM（下面给出代码中我们也提供了GRU的使用范例，参考被注释掉的部分，因为在TensorFlow中只要换一个函数就可以了）。

当然，对于Tensorflow的基本使用也是必备的，例如，你应该知道place holder、session这些东西都是什么，以及还如何正确地使用它们。同样，如果你对这些内容仍然一窍不通，那么你可以参考本系列博文之前的一篇文章《TensorFlow简明入门宝典》。

作为例子，我们这里要完成的任务是对0~9这十个手写数字进行分类。所使用的数据集为著名的MINST，关于这个数据集的介绍，以及关于数据读入部分的解读，可以参考之前的文章《基于Softmax实现手写数字识别》。

首先要做的事情当然是引入TensorFlow，并读入数据（注意，这里每个输入的图像的都是一个28x28的矩阵（如下图所示），所以也可以看成是28x28=784维的一个向量。）：

[python]view plain copy
import tensorflow as tf  
  
from tensorflow.examples.tutorials.mnist import input_data  
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)  

这里插入一段，实际执行时可以跳过：如果你想看看这些handwriting的数字到底长什么样，那么可以执行下面的代码：

[python]view plain copy
import matplotlib.pyplot as plt  
  
mnist = input_data.read_data_sets('MNIST_data')  
  
examples_n = 100 # display some images  
indexes = np.random.choice(range(mnist.train.images.shape[0]), examples_n, replace=False)  
  
fig = plt.figure(figsize=(5,5))  
  
for i in range(1, examples_n + 1):  
    a = fig.add_subplot(np.sqrt(examples_n), np.sqrt(examples_n), i)  
    a.axis('off')  
    image = mnist.train.images[indexes[i-1]].reshape((28, 28))  
    a.imshow(image, cmap='Greys_r');  
      
plt.show()  

输出结果如下图所示：

简单明了的LSTM/GRU应用实例（Tensorflow版）

然后，程序需要设定一些（和我们打算要设定之NN相关的）参数

[python]view plain copy
lr = 0.001                #learning rate，用于梯度下降  
training_iters = 100000   #训练的循环次数  
batch_size = 128          #随机梯度下降时，每次选择batch的大小  
  
n_inputs = 28             # MNIST data input (img shape: 28*28)  
n_steps = 28              # time steps  
n_hidden_units = 128      # neurons in hidden layer, also <strong><code>t</code></strong>he number of units in the LSTM cell.  
n_classes = 10            # MNIST classes (0-9 digits)  

接下来，来构建TensorFlow图的输入（也就是神经网络的输入）。其中，tf.float32 表示成员类型float32， [None, n_steps, n_inputs] 是tensor的shape， None表示第一维是任意数量，然后n_steps表示第二维是n_steps维，n_inputs表示第三个维度是n_inputs维。

[python]view plain copy
# tf Graph input  
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])  
y = tf.placeholder(tf.float32, [None, n_classes])  

下图是整个RNN的基本架构。每次从train_set中选128（batch_size）个图像做随机梯度下降。因为在输入和输出时都加上了一层隐藏层，可以理解为全连接的神经网络层，所以要在这两个隐藏层的前后都做reshape。隐藏层中的unit一共有128个（这个数字你也可以取其他值，对当前这个例子来说关系不大）。中间最核心的RNN cell一共要“循环”28次，所以如果你把cells拉开，就好像一个长度为28的链式结构。这个过程其实相当于是每个cell“负责看”一副图像的一行（一共有28行），然后之前各行的记忆会对后续一些行产生影响。

简单明了的LSTM/GRU应用实例（Tensorflow版）

接下来定义两个隐藏层中需要使用的权重矩阵W和偏置向量b，例如上图所示X经过第一次reshape之后shape变为3584×28，所以第一个（输入阶段）的隐藏层权重矩阵的shape为28×128，矩阵相乘之后得到的shape为3584×128。然后再做一次reshape，X_in的shape变成128×28×128，即128 batches，28 steps 和 128 hidden-units。

[python]view plain copy
# Define weights  
weights = {  
    # (28, 128)  
    'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),  
    # (128, 10)  
    'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))  
}  
biases = {  
    # (128, )  
    'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),  
    # (10, )  
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))  
}  

接下来我们就按照上面的思路来定义RNN的结构，它包含一个输入阶段的隐藏层，LSTM Cell和输出阶段的隐藏层：

[python]view plain copy
def RNN(X, weights, biases):  
    # hidden layer for input to cell  
    ########################################  
  
    # transpose the inputs shape from  
    # X ==> (128 batch * 28 steps, 28 inputs)  
    # for reshape, one shape dimension can be -1.   
    # In this case, the value is inferred from the length of   
    # the array and remaining dimensions.  
    X = tf.reshape(X, [-1, n_inputs])  
  
    # into hidden  
    # X_in = (128 batch * 28 steps, 128 hidden)  
    X_in = tf.matmul(X, weights['in']) + biases['in']  
    # X_in ==> (128 batch, 28 steps, 128 hidden)  
    X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])  
  
# cell  
    ##########################################  
    #The implementation of Basic LSTM recurrent network cell.  
    #We add forget_bias (default: 1) to the biases of the forget gate  
    #in order to reduce the scale of forgetting in the beginning of the training.  
    #It does not allow cell clipping, a projection layer,   
    #and does not use peep-hole connections: it is the basic baseline.  
    #For advanced models, please use the full tf.nn.rnn_cell.LSTMCell.  
      
    cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units)  #<strong><code>num_units</code></strong>: int, The number of units in the LSTM cell.  
    #cell = tf.contrib.rnn.GRUCell(n_hidden_units)  
      
    # lstm cell is divided into two parts (c_state, h_state)  
    # for the basic RNN, its states is only h_state  
    # zero_state returns zero-filled state tensor(s).  
    init_state = cell.zero_state(batch_size, dtype=tf.float32)  
  
    # You have 2 options for following step.  
    # 1: tf.nn.rnn(cell, inputs);  
    # 2: tf.nn.dynamic_rnn(cell, inputs).  
    # If use option 1, you have to modify the shape of X_in, go and check out [2].  
    # In here, we go for option 2. (which is recommended.)  
    # dynamic_rnn receive Tensor (batch, steps, inputs) or (steps, batch, inputs) as X_in.  
    # Make sure the time_major is changed accordingly.  
    # outputs is a list which stores all outputs for each step, final_state is the output of the last one state  
    outputs, final_state = tf.nn.dynamic_rnn(cell, X_in, initial_state=init_state, time_major=False)  
  
# hidden layer for output as the final results  
    #############################################  
    # results = tf.matmul(final_state[1], weights['out']) + biases['out']  
  
    # # or  
    # unpack to list [(batch, outputs)..] * steps  
    # Analogously to tf.pack and tf.unpack,   
    # we're renamed TensorArray.pack and TensorArray.unpack   
    # to TensorArray.stack and TensorArray.unstack.  
    outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))  
    results = tf.matmul(outputs[-1], weights['out']) + biases['out']    # shape = (128, 10)  
  
    return results  

然后跟一般的深度神经网络的训练方法差不多，定义loss function和最优化求解方法（例如梯度详解等等，这里我们使用的优化器是AdamOptimizer（这部分跟《基于Softmax实现手写数字识别》中基本一致）：

[python]view plain copy
pred = RNN(x, weights, biases)  
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))  
train_op = tf.train.AdamOptimizer(lr).minimize(cost)  

后面程序执行的时候，我们会计算预测的accuracy并输入，所以这里给出accuracy的定义：

[python]view plain copy
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))  
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))  

最后模型训练的过程也一如往前，只是我们会在其中不断输出预测的准确率。

[python]view plain copy
# Initialize the variables (i.e. assign their default value)  
init = tf.global_variables_initializer()  
  
with tf.Session() as sess:  
  
    sess.run(init)  
    step = 0  
    while step * batch_size < training_iters:  
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)  
        batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])  
        sess.run([train_op], feed_dict={  
            x: batch_xs,  
            y: batch_ys,  
        })  
        if step % 20 == 0:  
            print(sess.run(accuracy, feed_dict={  
            x: batch_xs,  
            y: batch_ys,  
            }))  
        step += 1  

执行一下，来看看效果，基本上准确率是曲折上升的，而且基本可以保持在95%以上的高位。需要说明的是，这个例子我们是在training set中计算的准确率，更好的实践还是用test set来评估模型的效果。

参考文献

【1】https://github.com/MorvanZhou/tutorials/tree/master/tensorflowTUT/tf20_RNN2

【2】https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py

（本文完）

[python]view plain copy
import tensorflow as tf  
  
from tensorflow.examples.tutorials.mnist import input_data  
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)  

这里插入一段，实际执行时可以跳过：如果你想看看这些handwriting的数字到底长什么样，那么可以执行下面的代码：

[python]view plain copy
import matplotlib.pyplot as plt  
  
mnist = input_data.read_data_sets('MNIST_data')  
  
examples_n = 100 # display some images  
indexes = np.random.choice(range(mnist.train.images.shape[0]), examples_n, replace=False)  
  
fig = plt.figure(figsize=(5,5))  
  
for i in range(1, examples_n + 1):  
    a = fig.add_subplot(np.sqrt(examples_n), np.sqrt(examples_n), i)  
    a.axis('off')  
    image = mnist.train.images[indexes[i-1]].reshape((28, 28))  
    a.imshow(image, cmap='Greys_r');  
      
plt.show()  

输出结果如下图所示：

简单明了的LSTM/GRU应用实例（Tensorflow版）

然后，程序需要设定一些（和我们打算要设定之NN相关的）参数

[python]view plain copy
lr = 0.001                #learning rate，用于梯度下降  
training_iters = 100000   #训练的循环次数  
batch_size = 128          #随机梯度下降时，每次选择batch的大小  
  
n_inputs = 28             # MNIST data input (img shape: 28*28)  
n_steps = 28              # time steps  
n_hidden_units = 128      # neurons in hidden layer, also <strong><code>t</code></strong>he number of units in the LSTM cell.  
n_classes = 10            # MNIST classes (0-9 digits)  

[python]view plain copy
# tf Graph input  
x = tf.placeholder(tf.float32, [None, n_steps, n_inputs])  
y = tf.placeholder(tf.float32, [None, n_classes])  

简单明了的LSTM/GRU应用实例（Tensorflow版）

[python]view plain copy
# Define weights  
weights = {  
    # (28, 128)  
    'in': tf.Variable(tf.random_normal([n_inputs, n_hidden_units])),  
    # (128, 10)  
    'out': tf.Variable(tf.random_normal([n_hidden_units, n_classes]))  
}  
biases = {  
    # (128, )  
    'in': tf.Variable(tf.constant(0.1, shape=[n_hidden_units, ])),  
    # (10, )  
    'out': tf.Variable(tf.constant(0.1, shape=[n_classes, ]))  
}  

接下来我们就按照上面的思路来定义RNN的结构，它包含一个输入阶段的隐藏层，LSTM Cell和输出阶段的隐藏层：

[python]view plain copy
def RNN(X, weights, biases):  
    # hidden layer for input to cell  
    ########################################  
  
    # transpose the inputs shape from  
    # X ==> (128 batch * 28 steps, 28 inputs)  
    # for reshape, one shape dimension can be -1.   
    # In this case, the value is inferred from the length of   
    # the array and remaining dimensions.  
    X = tf.reshape(X, [-1, n_inputs])  
  
    # into hidden  
    # X_in = (128 batch * 28 steps, 128 hidden)  
    X_in = tf.matmul(X, weights['in']) + biases['in']  
    # X_in ==> (128 batch, 28 steps, 128 hidden)  
    X_in = tf.reshape(X_in, [-1, n_steps, n_hidden_units])  
  
# cell  
    ##########################################  
    #The implementation of Basic LSTM recurrent network cell.  
    #We add forget_bias (default: 1) to the biases of the forget gate  
    #in order to reduce the scale of forgetting in the beginning of the training.  
    #It does not allow cell clipping, a projection layer,   
    #and does not use peep-hole connections: it is the basic baseline.  
    #For advanced models, please use the full tf.nn.rnn_cell.LSTMCell.  
      
    cell = tf.contrib.rnn.BasicLSTMCell(n_hidden_units)  #<strong><code>num_units</code></strong>: int, The number of units in the LSTM cell.  
    #cell = tf.contrib.rnn.GRUCell(n_hidden_units)  
      
    # lstm cell is divided into two parts (c_state, h_state)  
    # for the basic RNN, its states is only h_state  
    # zero_state returns zero-filled state tensor(s).  
    init_state = cell.zero_state(batch_size, dtype=tf.float32)  
  
    # You have 2 options for following step.  
    # 1: tf.nn.rnn(cell, inputs);  
    # 2: tf.nn.dynamic_rnn(cell, inputs).  
    # If use option 1, you have to modify the shape of X_in, go and check out [2].  
    # In here, we go for option 2. (which is recommended.)  
    # dynamic_rnn receive Tensor (batch, steps, inputs) or (steps, batch, inputs) as X_in.  
    # Make sure the time_major is changed accordingly.  
    # outputs is a list which stores all outputs for each step, final_state is the output of the last one state  
    outputs, final_state = tf.nn.dynamic_rnn(cell, X_in, initial_state=init_state, time_major=False)  
  
# hidden layer for output as the final results  
    #############################################  
    # results = tf.matmul(final_state[1], weights['out']) + biases['out']  
  
    # # or  
    # unpack to list [(batch, outputs)..] * steps  
    # Analogously to tf.pack and tf.unpack,   
    # we're renamed TensorArray.pack and TensorArray.unpack   
    # to TensorArray.stack and TensorArray.unstack.  
    outputs = tf.unstack(tf.transpose(outputs, [1,0,2]))  
    results = tf.matmul(outputs[-1], weights['out']) + biases['out']    # shape = (128, 10)  
  
    return results  

[python]view plain copy
pred = RNN(x, weights, biases)  
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))  
train_op = tf.train.AdamOptimizer(lr).minimize(cost)  

后面程序执行的时候，我们会计算预测的accuracy并输入，所以这里给出accuracy的定义：

[python]view plain copy
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))  
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))  

最后模型训练的过程也一如往前，只是我们会在其中不断输出预测的准确率。

[python]view plain copy
# Initialize the variables (i.e. assign their default value)  
init = tf.global_variables_initializer()  
  
with tf.Session() as sess:  
  
    sess.run(init)  
    step = 0  
    while step * batch_size < training_iters:  
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)  
        batch_xs = batch_xs.reshape([batch_size, n_steps, n_inputs])  
        sess.run([train_op], feed_dict={  
            x: batch_xs,  
            y: batch_ys,  
        })  
        if step % 20 == 0:  
            print(sess.run(accuracy, feed_dict={  
            x: batch_xs,  
            y: batch_ys,  
            }))  
        step += 1  

参考文献

【1】https://github.com/MorvanZhou/tutorials/tree/master/tensorflowTUT/tf20_RNN2

【2】https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/recurrent_network.py

（本文完）

简单明了的LSTM/GRU应用实例（Tensorflow版）

相关推荐