值迭代网络

论文信息

1. 论文介绍
2. 代码实现

2.1 基础知识

Model.py
Train.py
data.py

功能快捷键
生成一个适合你的列表
创建一个表格

设定内容居中、居左、居右
SmartyPants

新的甘特图功能，丰富你的文章
FLowchart流程图

论文信息

Paper: http://papers.nips.cc/paper/6046-value-iteration-networks.pdf
Slide：https://daiwk.github.io/assets/value-iteration-networks-slide.pdf
Code: https://github.com/TheAbhiKumar/tensorflow-value-iteration-networks

1. 论文介绍

巧妙利用CNN的结构实现可训练的VI模块，并用于辅助RL规划任务的迁移
公式理解：

VI迭代公式
VI Model

input: rewards image R with l,m,n dimensions

理解多通道卷积操作：https://www.cnblogs.com/nsnow/p/4562308.html
输入通道并无法决定输入通道，都是卷积核相乘并相加的过程，最后的输出通道取决于卷积核的个数

2. 代码实现

2.1 基础知识

tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
input:[batch, in_height, in_width, in_channels] 4维张量
[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]
filter： [filter_height, filter_width, in_channels, out_channels]
[卷积核的高度，卷积核的宽度，图像通道数，卷积核个数]
output: featuremap [batch, height, width, channels]
padding; ‘SAME’时，表示卷积核可以停留在图像边缘

featuremap的大小计算公式为：output_h =（originalSize_h+padding*2-kernelSize_h）/stride +1

Pooling操作后的大小为：pool1_h = (conv1_h - kernelSize_h)/stride +1

tf.reduce_max[ ]
根据tensor的维度计算最大值,
axis: The dimensions to reduce. If None (the default),reduces all dimensions. Must be in the range [- rank(input_tensor), rank(input_tensor)).
keepdims: If true, retains reduced dimensions with length 1.
tf.concat [value,axis,name]
values: 张量list或者元组
axis: 要链接的维度
return：链接后的张量
tf.transpose(input, [dimension1,…,])
更改张量维度数据，类似二阶的转置
tf.shape 与 tf.reshape
tf.reshape(tensor,shape,name=None)
tf.stack 与 tf.tile
tf.stack 与 tf.concat()功能类似，tf.stack是增维的拼接 ab = tf.stack([a,b], axis=0)
tf.tile(input, multiples, name=None) 进行张量扩展
tf.cast( x, dtype, name=None) 转换数据类型

tf.gather(x, index) 一维数组的索引
tf.gather_nd（x, index）多维上进行索引
tf.nn.softmax

    tf.nn.softmax(
    logits,
    axis=None,
    name=None,
    dim=None)

Model.py

import numpy as np
import tensorflow as tf
from utils import conv2d_flipkernel

def VI_Block(X, S1, S2, config):
    k    = config.k    # Number of value iterations performed
    ch_i = config.ch_i # Channels in input layer  //输入x的通道数，分析data.py
    ch_h = config.ch_h # Channels in initial hidden layer
    ch_q = config.ch_q # Channels in q layer (~actions)
    state_batch_size = config.statebatchsize # k+1 state inputs for each channel

    bias  = tf.Variable(np.random.randn(1, 1, 1, ch_h)    * 0.01, dtype=tf.float32)
    # weights from inputs to q layer (~reward in Bellman equation)
    w0    = tf.Variable(np.random.randn(3, 3, ch_i, ch_h) * 0.01, dtype=tf.float32)  // x->r的实现
    w1    = tf.Variable(np.random.randn(1, 1, ch_h, 1)    * 0.01, dtype=tf.float32)  // x->r的实现
    w     = tf.Variable(np.random.randn(3, 3, 1, ch_q)    * 0.01, dtype=tf.float32)  // r->Q的实现
    # feedback weights from v layer into q layer (~transition probabilities in Bellman equation)
    w_fb  = tf.Variable(np.random.randn(3, 3, 1, ch_q)    * 0.01, dtype=tf.float32)  // v->Q的实现
    w_o   = tf.Variable(np.random.randn(ch_q, 8)          * 0.01, dtype=tf.float32)  // 输出

    # initial conv layer over image+reward prior
    h = conv2d_flipkernel(X, w0, name="h0") + bias

    r = conv2d_flipkernel(h, w1, name="r")
    q = conv2d_flipkernel(r, w, name="q")
    v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")  //max pooling操作 Q的第四维是通道数

    for i in range(0, k-1):
        rv = tf.concat([r, v], 3)
        wwfb = tf.concat([w, w_fb], 2)
        q = conv2d_flipkernel(rv, wwfb, name="q")
        v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")

    # do one last convolution
    q = conv2d_flipkernel(tf.concat([r, v], 3),
                          tf.concat([w, w_fb], 2), name="q")

    # CHANGE TO THEANO ORDERING
    # Since we are selecting over channels, it becomes easier to work with
    # the tensor when it is in NCHW format vs NHWC
    q = tf.transpose(q, perm=[0, 3, 1, 2]) //将张量由NHWC 转换为NCHW

    # Select the conv-net channels at the state position (S1,S2).
    # This intuitively corresponds to each channel representing an action, and the convnet the Q function.
    # The tricky thing is we want to select the same (S1,S2) position *for each* channel and for each sample
    # TODO: performance can be improved here by substituting expensive
    #       transpose calls with better indexing for gather_nd
    bs = tf.shape(q)[0] //将矩阵的维度输出为一个维度矩阵，并返回第一维
    rprn = tf.reshape(tf.tile(tf.reshape(tf.range(bs), [-1, 1]), [1, state_batch_size]), [-1])// 
    ins1 = tf.cast(tf.reshape(S1, [-1]), tf.int32)
    ins2 = tf.cast(tf.reshape(S2, [-1]), tf.int32)
    idx_in = tf.transpose(tf.stack([ins1, ins2, rprn]), [1, 0])
    q_out = tf.gather_nd(tf.transpose(q, [2, 3, 0, 1]), idx_in, name="q_out")/对每个通道(S1,S2)位置进行索引，变化成标签

    # add logits
    logits = tf.matmul(q_out, w_o)
    # softmax output weights
    output = tf.nn.softmax(logits, name="output")
    return logits, output

# similar to the normal VI_Block except there are separate weights for each q layer
def VI_Untied_Block(X, S1, S2, config):
    k    = config.k    # Number of value iterations performed
    ch_i = config.ch_i # Channels in input layer
    ch_h = config.ch_h # Channels in initial hidden layer
    ch_q = config.ch_q # Channels in q layer (~actions)
    state_batch_size = config.statebatchsize # k+1 state inputs for each channel

    bias   = tf.Variable(np.random.randn(1, 1, 1, ch_h)    * 0.01, dtype=tf.float32)
    # weights from inputs to q layer (~reward in Bellman equation)
    w0     = tf.Variable(np.random.randn(3, 3, ch_i, ch_h) * 0.01, dtype=tf.float32)
    w1     = tf.Variable(np.random.randn(1, 1, ch_h, 1)    * 0.01, dtype=tf.float32)
    w_l    = [tf.Variable(np.random.randn(3, 3, 1, ch_q)   * 0.01, dtype=tf.float32) for i in range(0, k+1)]
    # feedback weights from v layer into q layer (~transition probabilities in Bellman equation)
    w_fb_l = [tf.Variable(np.random.randn(3, 3, 1, ch_q)   * 0.01, dtype=tf.float32) for i in range(0,k)]
    w_o    = tf.Variable(np.random.randn(ch_q, 8)          * 0.01, dtype=tf.float32)

    # initial conv layer over image+reward prior
    h = conv2d_flipkernel(X, w0, name="h0") + bias

    r = conv2d_flipkernel(h, w1, name="r")
    q = conv2d_flipkernel(r, w_l[0], name="q")
    v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")

    for i in range(0, k-1):
        rv = tf.concat([r, v], 3)
        wwfb = tf.concat([w_l[i+1], w_fb_l[i]], 2)
        q = conv2d_flipkernel(rv, wwfb, name="q")
        v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")

    # do one last convolution
    q = conv2d_flipkernel(tf.concat([r, v], 3),
                          tf.concat([w_l[k], w_fb_l[k-1]], 2), name="q")

    # CHANGE TO THEANO ORDERING
    # Since we are selecting over channels, it becomes easier to work with
    # the tensor when it is in NCHW format vs NHWC
    q = tf.transpose(q, perm=[0, 3, 1, 2])

    # Select the conv-net channels at the state position (S1,S2).
    # This intuitively corresponds to each channel representing an action, and the convnet the Q function.
    # The tricky thing is we want to select the same (S1,S2) position *for each* channel and for each sample
    # TODO: performance can be improved here by substituting expensive
    #       transpose calls with better indexing for gather_nd
    bs = tf.shape(q)[0] 
    rprn = tf.reshape(tf.tile(tf.reshape(tf.range(bs), [-1, 1]), [1, state_batch_size]), [-1])
    ins1 = tf.cast(tf.reshape(S1, [-1]), tf.int32)
    ins2 = tf.cast(tf.reshape(S2, [-1]), tf.int32)
    idx_in = tf.transpose(tf.stack([ins1, ins2, rprn]), [1, 0])
    q_out = tf.gather_nd(tf.transpose(q, [2, 3, 0, 1]), idx_in, name="q_out")//这就是最优轨迹点上的Q值

    # add logits
    logits = tf.matmul(q_out, w_o)
    # softmax output weights
    output = tf.nn.softmax(logits, name="output")
    return logits, output

#折磨人的state_batch_size =10 到底是什么含义
首先代码解释state_batch_size = k+1
从生产数据的代码看，8*8的地图全局输入，state_batch_size取的是最优轨迹上的10个点

Train.py

import time
import numpy as np
import tensorflow as tf
from data  import process_gridworld_data
from model import VI_Block, VI_Untied_Block
from utils import fmt_row

# Data
tf.app.flags.DEFINE_string('input',           'data/gridworld_8.mat', 'Path to data')
tf.app.flags.DEFINE_integer('imsize',         8,                      'Size of input image')
# Parameters
tf.app.flags.DEFINE_float('lr',               0.001,                  'Learning rate for RMSProp')
tf.app.flags.DEFINE_integer('epochs',         30,                     'Maximum epochs to train for')
tf.app.flags.DEFINE_integer('k',              10,                     'Number of value iterations')
tf.app.flags.DEFINE_integer('ch_i',           2,                      'Channels in input layer')
tf.app.flags.DEFINE_integer('ch_h',           150,                    'Channels in initial hidden layer')
tf.app.flags.DEFINE_integer('ch_q',           10,                     'Channels in q layer (~actions)')
tf.app.flags.DEFINE_integer('batchsize',      12,                     'Batch size')
tf.app.flags.DEFINE_integer('statebatchsize', 10,                     'Number of state inputs for each sample (real number, technically is k+1)') //与batch size的区别是什么
tf.app.flags.DEFINE_boolean('untied_weights', False,                  'Untie weights of VI network')
# Misc.
tf.app.flags.DEFINE_integer('seed',           0,                      'Random seed for numpy')
tf.app.flags.DEFINE_integer('display_step',   1,                      'Print summary output every n epochs')
tf.app.flags.DEFINE_boolean('log',            True,                  'Enable for tensorboard summary')
tf.app.flags.DEFINE_string('logdir',          '/tmp/vintf/',          'Directory to store tensorboard summary')

config = tf.app.flags.FLAGS

np.random.seed(config.seed)

# symbolic input image tensor where typically first channel is image, second is the reward prior
X  = tf.placeholder(tf.float32, name="X",  shape=[None, config.imsize, config.imsize, config.ch_i]) // [None, 8,8, 2]
# symbolic input batches of vertical positions
S1 = tf.placeholder(tf.int32,   name="S1", shape=[None, config.statebatchsize])
# symbolic input batches of horizontal positions
S2 = tf.placeholder(tf.int32,   name="S2", shape=[None, config.statebatchsize])
y  = tf.placeholder(tf.int32,   name="y",  shape=[None])

# Construct model (Value Iteration Network)
if (config.untied_weights):
    logits, nn = VI_Untied_Block(X, S1, S2, config)
else:
    logits, nn = VI_Block(X, S1, S2, config)
//logits = tf.matmul(q_out, w_o)  这个是s1和s2位置上的动作
# Define loss and optimizer
y_ = tf.cast(y, tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
    logits=logits, labels=y_, name='cross_entropy')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy_mean')
tf.add_to_collection('losses', cross_entropy_mean)

cost = tf.add_n(tf.get_collection('losses'), name='total_loss')
optimizer = tf.train.RMSPropOptimizer(learning_rate=config.lr, epsilon=1e-6, centered=True).minimize(cost)

# Test model & calculate accuracy
cp = tf.cast(tf.argmax(nn, 1), tf.int32) //选取的动作
err = tf.reduce_mean(tf.cast(tf.not_equal(cp, y), dtype=tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()
saver = tf.train.Saver()

Xtrain, S1train, S2train, ytrain, Xtest, S1test, S2test, ytest = process_gridworld_data(input=config.input, imsize=config.imsize)

# Launch the graph
with tf.Session() as sess:
    if config.log:
        for var in tf.trainable_variables():
            tf.summary.histogram(var.op.name, var)
        summary_op = tf.summary.merge_all()
        summary_writer = tf.summary.FileWriter(config.logdir, sess.graph)
    sess.run(init)

    batch_size = config.batchsize
    print(fmt_row(10, ["Epoch", "Train Cost", "Train Err", "Epoch Time"]))
    for epoch in range(int(config.epochs)):
        tstart = time.time()
        avg_err, avg_cost = 0.0, 0.0
        num_batches = int(Xtrain.shape[0]/batch_size)
        # Loop over all batches
        for i in range(0, Xtrain.shape[0], batch_size):
            j = i + batch_size
            if j <= Xtrain.shape[0]:
                # Run optimization op (backprop) and cost op (to get loss value)
                fd = {X: Xtrain[i:j], S1: S1train[i:j], S2: S2train[i:j],
                    y: ytrain[i * config.statebatchsize:j * config.statebatchsize]}
                _, e_, c_ = sess.run([optimizer, err, cost], feed_dict=fd)
                avg_err += e_
                avg_cost += c_
        # Display logs per epoch step
        if epoch % config.display_step == 0:
            elapsed = time.time() - tstart
            print(fmt_row(10, [epoch, avg_cost/num_batches, avg_err/num_batches, elapsed]))
        if config.log:
            summary = tf.Summary()
            summary.ParseFromString(sess.run(summary_op))
            summary.value.add(tag='Average error', simple_value=float(avg_err/num_batches))
            summary.value.add(tag='Average cost', simple_value=float(avg_cost/num_batches))
            summary_writer.add_summary(summary, epoch)
    print("Finished training!")

    # Test model
    correct_prediction = tf.cast(tf.argmax(nn, 1), tf.int32)
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(tf.not_equal(correct_prediction, y), dtype=tf.float32))
    acc = accuracy.eval({X: Xtest, S1: S1test, S2: S2test, y: ytest})
    print(f'Accuracy: {100 * (1 - acc)}%')

data.py

逼疯我的数据解析,以88的观测网格为例
for each state s=(i,j), (288)的观测图片输入 2为通道数，[s, s_image],
label为state处最优轨迹的action [8个方向]
im_data 反应的是88的地图信息(障碍物为1，free zone为0)
value_data 反应的是goal的信息(文中说goal位置为1，非goal为0，mat文件中为 10 和 0)

6/7的训练集 7776个训练样本
x_data s1_data s2_data y_data

import numpy as np
import scipy.io as sio

def process_gridworld_data(input, imsize):
    # run training from input matlab data file, and save test data prediction in output file
    # load data from Matlab file, including
    # im_data: flattened images
    # state_data: concatenated one-hot vectors for each state variable
    # state_xy_data: state variable (x,y position)
    # label_data: one-hot vector for action (state difference)
    im_size=[imsize, imsize]
    matlab_data = sio.loadmat(input)
    im_data = matlab_data["batch_im_data"] // 9072*64
    im_data = (im_data - 1)/255  # obstacles = 1, free zone = 0
    value_data = matlab_data["batch_value_data"]
    state1_data = matlab_data["state_x_data"]
    state2_data = matlab_data["state_y_data"]
    label_data = matlab_data["batch_label_data"]
    ydata = label_data.astype('int8')
    Xim_data = im_data.astype('float32')
    Xim_data = Xim_data.reshape(-1, 1, im_size[0], im_size[1]) //9072 1 8 8
    Xval_data = value_data.astype('float32')
    Xval_data = Xval_data.reshape(-1, 1, im_size[0], im_size[1]) //9072 1 8 8
    Xdata = np.append(Xim_data, Xval_data, axis=1) #输入数据
    # Need to transpose because Theano is NCHW, while TensorFlow is NHWC
    Xdata = np.transpose(Xdata,  (0, 2, 3, 1))
    S1data = state1_data.astype('int8') # 9072*10
    S2data = state2_data.astype('int8') # 9072*10 为什么是10列？ state_batch_size =10 

    all_training_samples = int(6/7.0*Xdata.shape[0])
    training_samples = all_training_samples
    Xtrain = Xdata[0:training_samples]
    S1train = S1data[0:training_samples]
    S2train = S2data[0:training_samples]
    ytrain = ydata[0:training_samples]

    Xtest = Xdata[all_training_samples:]
    S1test = S1data[all_training_samples:]
    S2test = S2data[all_training_samples:]
    ytest = ydata[all_training_samples:]
    ytest = ytest.flatten()

    sortinds = np.random.permutation(training_samples) //产生一个随机序列
    Xtrain = Xtrain[sortinds]
    S1train = S1train[sortinds]
    S2train = S2train[sortinds]
    ytrain = ytrain[sortinds]
    ytrain = ytrain.flatten()
    return Xtrain, S1train, S2train, ytrain, Xtest, S1test, S2test, ytest

功能快捷键

撤销：Ctrl/Command + Z
重做：Ctrl/Command + Y
加粗：Ctrl/Command + B
斜体：Ctrl/Command + I
标题：Ctrl/Command + Shift + H
无序列表：Ctrl/Command + Shift + U
有序列表：Ctrl/Command + Shift + O
检查列表：Ctrl/Command + Shift + C
插入代码：Ctrl/Command + Shift + K
插入链接：Ctrl/Command + Shift + L
插入图片：Ctrl/Command + Shift + G

生成一个适合你的列表

项目
- 项目
  - 项目
计划任务
完成任务

创建一个表格

设定内容居中、居左、居右

使用:---------:居中
使用:----------居左
使用----------:居右

第一列	第二列	第三列
第一列文本居中	第二列文本居右	第三列文本居左

SmartyPants

SmartyPants将ASCII标点字符转换为“智能”印刷标点HTML实体。例如：

TYPE	ASCII	HTML
Single backticks	`'Isn't this fun?'`	‘Isn’t this fun?’
Quotes	`"Isn't this fun?"`	“Isn’t this fun?”
Dashes	`-- is en-dash, --- is em-dash`	– is en-dash, — is em-dash

新的甘特图功能，丰富你的文章

关于 甘特图 语法，参考 [这儿][2],

FLowchart流程图

我们依旧会支持flowchart的流程图：

值迭代网络

值迭代网络

论文信息

1. 论文介绍

2. 代码实现

2.1 基础知识

Model.py

Train.py

data.py

功能快捷键

生成一个适合你的列表

创建一个表格

设定内容居中、居左、居右

SmartyPants

新的甘特图功能，丰富你的文章

FLowchart流程图

相关推荐