值迭代网络
值迭代网络
论文信息
Paper: http://papers.nips.cc/paper/6046-value-iteration-networks.pdf
Slide:https://daiwk.github.io/assets/value-iteration-networks-slide.pdf
Code: https://github.com/TheAbhiKumar/tensorflow-value-iteration-networks
1. 论文介绍
巧妙利用CNN的结构实现可训练的VI模块,并用于辅助RL规划任务的迁移
公式理解:
-
VI迭代公式
-
VI Model
input: rewards image R with l,m,n dimensions
理解多通道卷积操作:https://www.cnblogs.com/nsnow/p/4562308.html
输入通道并无法决定输入通道,都是卷积核相乘并相加的过程,最后的输出通道取决于卷积核的个数
2. 代码实现
2.1 基础知识
- tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, name=None)
input:[batch, in_height, in_width, in_channels] 4维张量
[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]
filter: [filter_height, filter_width, in_channels, out_channels]
[卷积核的高度,卷积核的宽度,图像通道数,卷积核个数]
output: featuremap [batch, height, width, channels]
padding; ‘SAME’时,表示卷积核可以停留在图像边缘
featuremap的大小计算公式为:output_h =(originalSize_h+padding*2-kernelSize_h)/stride +1
Pooling操作后的大小为:pool1_h = (conv1_h - kernelSize_h)/stride +1
-
tf.reduce_max[ ]
根据tensor的维度计算最大值,
axis: The dimensions to reduce. IfNone
(the default),reduces all dimensions. Must be in the range[- rank(input_tensor), rank(input_tensor))
.
keepdims: If true, retains reduced dimensions with length 1. -
tf.concat [value,axis,name]
values: 张量list或者元组
axis: 要链接的维度
return:链接后的张量 -
tf.transpose(input, [dimension1,…,])
更改张量维度数据,类似二阶的转置 -
tf.shape 与 tf.reshape
tf.reshape(tensor,shape,name=None) -
tf.stack 与 tf.tile
tf.stack 与 tf.concat()功能类似,tf.stack是增维的拼接 ab = tf.stack([a,b], axis=0)
tf.tile(input, multiples, name=None) 进行张量扩展
tf.cast( x, dtype, name=None) 转换数据类型tf.gather(x, index) 一维数组的索引
tf.gather_nd(x, index) 多维上进行索引 -
tf.nn.softmax
tf.nn.softmax(
logits,
axis=None,
name=None,
dim=None)
Model.py
import numpy as np
import tensorflow as tf
from utils import conv2d_flipkernel
def VI_Block(X, S1, S2, config):
k = config.k # Number of value iterations performed
ch_i = config.ch_i # Channels in input layer //输入x的通道数,分析data.py
ch_h = config.ch_h # Channels in initial hidden layer
ch_q = config.ch_q # Channels in q layer (~actions)
state_batch_size = config.statebatchsize # k+1 state inputs for each channel
bias = tf.Variable(np.random.randn(1, 1, 1, ch_h) * 0.01, dtype=tf.float32)
# weights from inputs to q layer (~reward in Bellman equation)
w0 = tf.Variable(np.random.randn(3, 3, ch_i, ch_h) * 0.01, dtype=tf.float32) // x->r的实现
w1 = tf.Variable(np.random.randn(1, 1, ch_h, 1) * 0.01, dtype=tf.float32) // x->r的实现
w = tf.Variable(np.random.randn(3, 3, 1, ch_q) * 0.01, dtype=tf.float32) // r->Q的实现
# feedback weights from v layer into q layer (~transition probabilities in Bellman equation)
w_fb = tf.Variable(np.random.randn(3, 3, 1, ch_q) * 0.01, dtype=tf.float32) // v->Q的实现
w_o = tf.Variable(np.random.randn(ch_q, 8) * 0.01, dtype=tf.float32) // 输出
# initial conv layer over image+reward prior
h = conv2d_flipkernel(X, w0, name="h0") + bias
r = conv2d_flipkernel(h, w1, name="r")
q = conv2d_flipkernel(r, w, name="q")
v = tf.reduce_max(q, axis=3, keep_dims=True, name="v") //max pooling操作 Q的第四维是通道数
for i in range(0, k-1):
rv = tf.concat([r, v], 3)
wwfb = tf.concat([w, w_fb], 2)
q = conv2d_flipkernel(rv, wwfb, name="q")
v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")
# do one last convolution
q = conv2d_flipkernel(tf.concat([r, v], 3),
tf.concat([w, w_fb], 2), name="q")
# CHANGE TO THEANO ORDERING
# Since we are selecting over channels, it becomes easier to work with
# the tensor when it is in NCHW format vs NHWC
q = tf.transpose(q, perm=[0, 3, 1, 2]) //将张量由NHWC 转换为NCHW
# Select the conv-net channels at the state position (S1,S2).
# This intuitively corresponds to each channel representing an action, and the convnet the Q function.
# The tricky thing is we want to select the same (S1,S2) position *for each* channel and for each sample
# TODO: performance can be improved here by substituting expensive
# transpose calls with better indexing for gather_nd
bs = tf.shape(q)[0] //将矩阵的维度输出为一个维度矩阵,并返回第一维
rprn = tf.reshape(tf.tile(tf.reshape(tf.range(bs), [-1, 1]), [1, state_batch_size]), [-1])//
ins1 = tf.cast(tf.reshape(S1, [-1]), tf.int32)
ins2 = tf.cast(tf.reshape(S2, [-1]), tf.int32)
idx_in = tf.transpose(tf.stack([ins1, ins2, rprn]), [1, 0])
q_out = tf.gather_nd(tf.transpose(q, [2, 3, 0, 1]), idx_in, name="q_out")/对每个通道(S1,S2)位置进行索引,变化成标签
# add logits
logits = tf.matmul(q_out, w_o)
# softmax output weights
output = tf.nn.softmax(logits, name="output")
return logits, output
# similar to the normal VI_Block except there are separate weights for each q layer
def VI_Untied_Block(X, S1, S2, config):
k = config.k # Number of value iterations performed
ch_i = config.ch_i # Channels in input layer
ch_h = config.ch_h # Channels in initial hidden layer
ch_q = config.ch_q # Channels in q layer (~actions)
state_batch_size = config.statebatchsize # k+1 state inputs for each channel
bias = tf.Variable(np.random.randn(1, 1, 1, ch_h) * 0.01, dtype=tf.float32)
# weights from inputs to q layer (~reward in Bellman equation)
w0 = tf.Variable(np.random.randn(3, 3, ch_i, ch_h) * 0.01, dtype=tf.float32)
w1 = tf.Variable(np.random.randn(1, 1, ch_h, 1) * 0.01, dtype=tf.float32)
w_l = [tf.Variable(np.random.randn(3, 3, 1, ch_q) * 0.01, dtype=tf.float32) for i in range(0, k+1)]
# feedback weights from v layer into q layer (~transition probabilities in Bellman equation)
w_fb_l = [tf.Variable(np.random.randn(3, 3, 1, ch_q) * 0.01, dtype=tf.float32) for i in range(0,k)]
w_o = tf.Variable(np.random.randn(ch_q, 8) * 0.01, dtype=tf.float32)
# initial conv layer over image+reward prior
h = conv2d_flipkernel(X, w0, name="h0") + bias
r = conv2d_flipkernel(h, w1, name="r")
q = conv2d_flipkernel(r, w_l[0], name="q")
v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")
for i in range(0, k-1):
rv = tf.concat([r, v], 3)
wwfb = tf.concat([w_l[i+1], w_fb_l[i]], 2)
q = conv2d_flipkernel(rv, wwfb, name="q")
v = tf.reduce_max(q, axis=3, keep_dims=True, name="v")
# do one last convolution
q = conv2d_flipkernel(tf.concat([r, v], 3),
tf.concat([w_l[k], w_fb_l[k-1]], 2), name="q")
# CHANGE TO THEANO ORDERING
# Since we are selecting over channels, it becomes easier to work with
# the tensor when it is in NCHW format vs NHWC
q = tf.transpose(q, perm=[0, 3, 1, 2])
# Select the conv-net channels at the state position (S1,S2).
# This intuitively corresponds to each channel representing an action, and the convnet the Q function.
# The tricky thing is we want to select the same (S1,S2) position *for each* channel and for each sample
# TODO: performance can be improved here by substituting expensive
# transpose calls with better indexing for gather_nd
bs = tf.shape(q)[0]
rprn = tf.reshape(tf.tile(tf.reshape(tf.range(bs), [-1, 1]), [1, state_batch_size]), [-1])
ins1 = tf.cast(tf.reshape(S1, [-1]), tf.int32)
ins2 = tf.cast(tf.reshape(S2, [-1]), tf.int32)
idx_in = tf.transpose(tf.stack([ins1, ins2, rprn]), [1, 0])
q_out = tf.gather_nd(tf.transpose(q, [2, 3, 0, 1]), idx_in, name="q_out")//这就是最优轨迹点上的Q值
# add logits
logits = tf.matmul(q_out, w_o)
# softmax output weights
output = tf.nn.softmax(logits, name="output")
return logits, output
#折磨人的state_batch_size =10 到底是什么含义
首先代码解释state_batch_size = k+1
从生产数据的代码看,8*8的地图全局输入,state_batch_size取的是最优轨迹上的10个点
Train.py
import time
import numpy as np
import tensorflow as tf
from data import process_gridworld_data
from model import VI_Block, VI_Untied_Block
from utils import fmt_row
# Data
tf.app.flags.DEFINE_string('input', 'data/gridworld_8.mat', 'Path to data')
tf.app.flags.DEFINE_integer('imsize', 8, 'Size of input image')
# Parameters
tf.app.flags.DEFINE_float('lr', 0.001, 'Learning rate for RMSProp')
tf.app.flags.DEFINE_integer('epochs', 30, 'Maximum epochs to train for')
tf.app.flags.DEFINE_integer('k', 10, 'Number of value iterations')
tf.app.flags.DEFINE_integer('ch_i', 2, 'Channels in input layer')
tf.app.flags.DEFINE_integer('ch_h', 150, 'Channels in initial hidden layer')
tf.app.flags.DEFINE_integer('ch_q', 10, 'Channels in q layer (~actions)')
tf.app.flags.DEFINE_integer('batchsize', 12, 'Batch size')
tf.app.flags.DEFINE_integer('statebatchsize', 10, 'Number of state inputs for each sample (real number, technically is k+1)') //与batch size的区别是什么
tf.app.flags.DEFINE_boolean('untied_weights', False, 'Untie weights of VI network')
# Misc.
tf.app.flags.DEFINE_integer('seed', 0, 'Random seed for numpy')
tf.app.flags.DEFINE_integer('display_step', 1, 'Print summary output every n epochs')
tf.app.flags.DEFINE_boolean('log', True, 'Enable for tensorboard summary')
tf.app.flags.DEFINE_string('logdir', '/tmp/vintf/', 'Directory to store tensorboard summary')
config = tf.app.flags.FLAGS
np.random.seed(config.seed)
# symbolic input image tensor where typically first channel is image, second is the reward prior
X = tf.placeholder(tf.float32, name="X", shape=[None, config.imsize, config.imsize, config.ch_i]) // [None, 8,8, 2]
# symbolic input batches of vertical positions
S1 = tf.placeholder(tf.int32, name="S1", shape=[None, config.statebatchsize])
# symbolic input batches of horizontal positions
S2 = tf.placeholder(tf.int32, name="S2", shape=[None, config.statebatchsize])
y = tf.placeholder(tf.int32, name="y", shape=[None])
# Construct model (Value Iteration Network)
if (config.untied_weights):
logits, nn = VI_Untied_Block(X, S1, S2, config)
else:
logits, nn = VI_Block(X, S1, S2, config)
//logits = tf.matmul(q_out, w_o) 这个是s1和s2位置上的动作
# Define loss and optimizer
y_ = tf.cast(y, tf.int64)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=y_, name='cross_entropy')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy_mean')
tf.add_to_collection('losses', cross_entropy_mean)
cost = tf.add_n(tf.get_collection('losses'), name='total_loss')
optimizer = tf.train.RMSPropOptimizer(learning_rate=config.lr, epsilon=1e-6, centered=True).minimize(cost)
# Test model & calculate accuracy
cp = tf.cast(tf.argmax(nn, 1), tf.int32) //选取的动作
err = tf.reduce_mean(tf.cast(tf.not_equal(cp, y), dtype=tf.float32))
# Initializing the variables
init = tf.global_variables_initializer()
saver = tf.train.Saver()
Xtrain, S1train, S2train, ytrain, Xtest, S1test, S2test, ytest = process_gridworld_data(input=config.input, imsize=config.imsize)
# Launch the graph
with tf.Session() as sess:
if config.log:
for var in tf.trainable_variables():
tf.summary.histogram(var.op.name, var)
summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter(config.logdir, sess.graph)
sess.run(init)
batch_size = config.batchsize
print(fmt_row(10, ["Epoch", "Train Cost", "Train Err", "Epoch Time"]))
for epoch in range(int(config.epochs)):
tstart = time.time()
avg_err, avg_cost = 0.0, 0.0
num_batches = int(Xtrain.shape[0]/batch_size)
# Loop over all batches
for i in range(0, Xtrain.shape[0], batch_size):
j = i + batch_size
if j <= Xtrain.shape[0]:
# Run optimization op (backprop) and cost op (to get loss value)
fd = {X: Xtrain[i:j], S1: S1train[i:j], S2: S2train[i:j],
y: ytrain[i * config.statebatchsize:j * config.statebatchsize]}
_, e_, c_ = sess.run([optimizer, err, cost], feed_dict=fd)
avg_err += e_
avg_cost += c_
# Display logs per epoch step
if epoch % config.display_step == 0:
elapsed = time.time() - tstart
print(fmt_row(10, [epoch, avg_cost/num_batches, avg_err/num_batches, elapsed]))
if config.log:
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary.value.add(tag='Average error', simple_value=float(avg_err/num_batches))
summary.value.add(tag='Average cost', simple_value=float(avg_cost/num_batches))
summary_writer.add_summary(summary, epoch)
print("Finished training!")
# Test model
correct_prediction = tf.cast(tf.argmax(nn, 1), tf.int32)
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(tf.not_equal(correct_prediction, y), dtype=tf.float32))
acc = accuracy.eval({X: Xtest, S1: S1test, S2: S2test, y: ytest})
print(f'Accuracy: {100 * (1 - acc)}%')
data.py
逼疯我的数据解析,以88的观测网格为例
for each state s=(i,j), (288)的观测图片输入 2为通道数,[s, s_image],
label为state处最优轨迹的action [8个方向]
im_data 反应的是88的地图信息(障碍物为1,free zone为0)
value_data 反应的是goal的信息(文中说goal位置为1,非goal为0,mat文件中为 10 和 0)
6/7的训练集 7776个训练样本
x_data s1_data s2_data y_data
import numpy as np
import scipy.io as sio
def process_gridworld_data(input, imsize):
# run training from input matlab data file, and save test data prediction in output file
# load data from Matlab file, including
# im_data: flattened images
# state_data: concatenated one-hot vectors for each state variable
# state_xy_data: state variable (x,y position)
# label_data: one-hot vector for action (state difference)
im_size=[imsize, imsize]
matlab_data = sio.loadmat(input)
im_data = matlab_data["batch_im_data"] // 9072*64
im_data = (im_data - 1)/255 # obstacles = 1, free zone = 0
value_data = matlab_data["batch_value_data"]
state1_data = matlab_data["state_x_data"]
state2_data = matlab_data["state_y_data"]
label_data = matlab_data["batch_label_data"]
ydata = label_data.astype('int8')
Xim_data = im_data.astype('float32')
Xim_data = Xim_data.reshape(-1, 1, im_size[0], im_size[1]) //9072 1 8 8
Xval_data = value_data.astype('float32')
Xval_data = Xval_data.reshape(-1, 1, im_size[0], im_size[1]) //9072 1 8 8
Xdata = np.append(Xim_data, Xval_data, axis=1) #输入数据
# Need to transpose because Theano is NCHW, while TensorFlow is NHWC
Xdata = np.transpose(Xdata, (0, 2, 3, 1))
S1data = state1_data.astype('int8') # 9072*10
S2data = state2_data.astype('int8') # 9072*10 为什么是10列? state_batch_size =10
all_training_samples = int(6/7.0*Xdata.shape[0])
training_samples = all_training_samples
Xtrain = Xdata[0:training_samples]
S1train = S1data[0:training_samples]
S2train = S2data[0:training_samples]
ytrain = ydata[0:training_samples]
Xtest = Xdata[all_training_samples:]
S1test = S1data[all_training_samples:]
S2test = S2data[all_training_samples:]
ytest = ydata[all_training_samples:]
ytest = ytest.flatten()
sortinds = np.random.permutation(training_samples) //产生一个随机序列
Xtrain = Xtrain[sortinds]
S1train = S1train[sortinds]
S2train = S2train[sortinds]
ytrain = ytrain[sortinds]
ytrain = ytrain.flatten()
return Xtrain, S1train, S2train, ytrain, Xtest, S1test, S2test, ytest
功能快捷键
撤销:Ctrl/Command + Z
重做:Ctrl/Command + Y
加粗:Ctrl/Command + B
斜体:Ctrl/Command + I
标题:Ctrl/Command + Shift + H
无序列表:Ctrl/Command + Shift + U
有序列表:Ctrl/Command + Shift + O
检查列表:Ctrl/Command + Shift + C
插入代码:Ctrl/Command + Shift + K
插入链接:Ctrl/Command + Shift + L
插入图片:Ctrl/Command + Shift + G
生成一个适合你的列表
-
项目
- 项目
- 项目
- 项目
-
计划任务
-
完成任务
创建一个表格
设定内容居中、居左、居右
使用:---------:
居中
使用:----------
居左
使用----------:
居右
第一列 | 第二列 | 第三列 |
---|---|---|
第一列文本居中 | 第二列文本居右 | 第三列文本居左 |
SmartyPants
SmartyPants将ASCII标点字符转换为“智能”印刷标点HTML实体。例如:
TYPE | ASCII | HTML |
---|---|---|
Single backticks | 'Isn't this fun?' |
‘Isn’t this fun?’ |
Quotes | "Isn't this fun?" |
“Isn’t this fun?” |
Dashes | -- is en-dash, --- is em-dash |
– is en-dash, — is em-dash |
新的甘特图功能,丰富你的文章
- 关于 甘特图 语法,参考 [这儿][2],
FLowchart流程图
我们依旧会支持flowchart的流程图: