Tensorflow自动编码器代码澄清和自定义测试数据
问题描述:
我想问一个关于Tensorflow输入队列我不完全了解的问题。我已经创建了一个Tensorflow模块,它使用下面的代码创建数据批次。Tensorflow自动编码器代码澄清和自定义测试数据
此代码:
# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9
def batch_generator(filenames, record_bytes):
""" filenames is the list of files you want to read from.
In this case, it contains only heart.csv
"""
record_bytes = 29**2 # 29x29 images per record
filename_queue = tf.train.string_input_producer(filenames)
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip the first line in the file
_, value = reader.read(filename_queue)
print(value)
# read in the 10 columns of data
content = tf.decode_raw(value, out_type=tf.uint8)
# The bytes read represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(
tf.strided_slice(content, [0],
[record_bytes]),
[1, 29, 29])
# Convert from [depth, height, width] to [height, width, depth].
uint8image = tf.transpose(depth_major, [1, 2, 0])
uint8image = tf.reshape(uint8image, [29**2]) # reshape it a single- dimensional vector
uint8image = tf.cast(uint8image, tf.float32)
uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along vertical dimension
# minimum number elements in the queue after a dequeue, used to ensure
# that the samples are sufficiently mixed
# I think 10 times the BATCH_SIZE is sufficient
min_after_dequeue = 10 * BATCH_SIZE
# the maximum number of elements in the queue
capacity = 20 * BATCH_SIZE
# shuffle the data to generate BATCH_SIZE sample pairs
data_batch = tf.train.shuffle_batch([uint8image], batch_size=BATCH_SIZE,
capacity=capacity, min_after_dequeue=min_after_dequeue)
return data_batch
我的问题是我能得到准确记录128每次我调用这个函数?等等。
batch_xs = sess.run(data_batch)
1)在这种情况下,我们的batch_xs值是多少?
2)我使用的例子,使用下面的代码,以评估培训的效率:
encode_decode = sess.run(
y_pred, feed_dict={X: mnist.test.images[:examples_to_show]})
我怎么会去喂我已经存储在另一二进制我自己的测试数据文件?这个问题与我之前在Tensorflow Autoencoder with custom training examples from binary file找到的帖子有关。
答
为了解决上述问题,我用我创建了一个data_reader模块,其如下所示:
import tensorflow as tf
# various initialization variables
BATCH_SIZE = 128
N_FEATURES = 9
def batch_generator(filenames, record_bytes):
""" filenames is the list of files you want to read from.
In this case, it contains only heart.csv
"""
record_bytes = 29**2 # 29x29 images per record
filename_queue = tf.train.string_input_producer(filenames)
reader = tf.FixedLengthRecordReader(record_bytes=record_bytes) # skip the first line in the file
_, value = reader.read(filename_queue)
print(value)
# record_defaults are the default values in case some of our columns are empty
# This is also to tell tensorflow the format of our data (the type of the decode result)
# for this dataset, out of 9 feature columns,
# 8 of them are floats (some are integers, but to make our features homogenous,
# we consider them floats), and 1 is string (at position 5)
# the last column corresponds to the lable is an integer
#record_defaults = [[1.0] for _ in range(N_FEATURES)]
#record_defaults[4] = ['']
#record_defaults.append([1])
# read in the 10 columns of data
content = tf.decode_raw(value, out_type=tf.uint8)
#print(content)
# convert the 5th column (present/absent) to the binary value 0 and 1
#condition = tf.equal(content[4], tf.constant('Present'))
#content[4] = tf.where(condition, tf.constant(1.0), tf.constant(0.0))
# pack all UINT8 values into a tensor
features = tf.stack(content)
#print(features)
# assign the last column to label
#label = content[-1]
# The bytes read represent the image, which we reshape
# from [depth * height * width] to [depth, height, width].
depth_major = tf.reshape(
tf.strided_slice(content, [0],
[record_bytes]),
[1, 29, 29])
# Convert from [depth, height, width] to [height, width, depth].
uint8image = tf.transpose(depth_major, [1, 2, 0])
uint8image = tf.reshape(uint8image, [29**2]) # reshape it a single-dimensional vector
uint8image = tf.cast(uint8image, tf.float32)
uint8image = tf.nn.l2_normalize(uint8image,dim=0) # normalize along vertical dimension
# minimum number elements in the queue after a dequeue, used to ensure
# that the samples are sufficiently mixed
# I think 10 times the BATCH_SIZE is sufficient
min_after_dequeue = 10 * BATCH_SIZE
# the maximum number of elements in the queue
capacity = 20 * BATCH_SIZE
# shuffle the data to generate BATCH_SIZE sample pairs
data_batch = tf.train.shuffle_batch([uint8image], batch_size=BATCH_SIZE,
capacity=capacity, min_after_dequeue=min_after_dequeue)
return data_batch
我然后,创建一个新的data_batch_eval如下:
data_batch_eval = data_reader.batch_generator([DATA_PATH_EVAL],29**2) #
EVAL设置
这是测试代码:
encode_decode = sess.run(
y_pred, feed_dict={X: batch_ys[:examples_to_show]})
# Compare original images with their reconstructions
f, a = plt.subplots(2, 10, figsize=(10, 2))
for i in range(examples_to_show):
#a[0][i].imshow(np.reshape(mnist.test.images[i], (28, 28)))
a[0][i].imshow(np.reshape(batch_ys[i], (29, 29)), cmap='gray')
a[1][i].imshow(np.reshape(encode_decode[i], (29, 29)), cmap='gray')
f.show()
plt.draw()
plt.waitforbuttonpress()
我的问题是,现在我相信encode_decode图像都指向相同的图像。可能这与Autoencoder培训代码中出现错误的地方有关,如上所示?