计算状态:未找到:在检查点文件中未找到张量名称“input_producer/limit_epochs/epochs”

问题描述:

我正在使用CIFAR10示例。我按照提供的代码对网络进行了培训。培训成功完成。由于我只想在数据集上评估每个示例一次,因此我已将cifar10_input.py中的输入修改为以下内容。计算状态:未找到:在检查点文件中未找到张量名称“input_producer/limit_epochs/epochs”

def inputs(eval_data, data_dir, batch_size): 
    filename = os.path.join(data_dir, TEST_FILE) 
    filename_queue = tf.train.string_input_producer([filename],num_epochs=1) 
    image, label = read_and_decode(filename_queue) 
    float_image = tf.image.per_image_whitening(image) 
    min_fraction_of_examples_in_queue = 0.4 
    min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_EVAL * 
          min_fraction_of_examples_in_queue) 
    images, label_batch = tf.train.batch(
     [image, label], 
     batch_size=batch_size, 
     num_threads=1, 
     capacity=min_queue_examples + 3 * batch_size) 

    tf.image_summary('images', images) 
    return images, tf.reshape(label_batch, [batch_size]) 

我已分离出的问题为以下:

tf.train_string_input_producer([文件名],num_epochs = 1)

如果我不设置num_epochs = 1,一切正常,因为它是。如果我这样做,我会得到以下错误。

0x2cf2700 Compute status: Not found: Tensor name "input_producer/limit_epochs/epochs" not found in checkpoint files /home/jkschin/tensorflow/my_code/data/svhn/train/model.ckpt-8000 

谢谢你的帮忙!

编辑3 @ mrry:

它仍然失败。这是跟踪。

Traceback (most recent call last): 
    File "cnn_eval.py", line 148, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run 
    sys.exit(main(sys.argv)) 
    File "cnn_eval.py", line 144, in main 
    evaluate() 
    File "cnn_eval.py", line 119, in evaluate 
    saver = tf.train.Saver([v for v in variables_to_restore if v.name != "input_producer/limit_epochs/epochs"]) 
AttributeError: 'unicode' object has no attribute 'name' 

EDIT 4 @mrry:

softmax_linear /偏压/ ExponentialMovingAverage

conv2/biases/ExponentialMovingAverage 
local4/biases/ExponentialMovingAverage 
local3/biases/ExponentialMovingAverage 
softmax_linear/weights/ExponentialMovingAverage 
conv1/biases/ExponentialMovingAverage 
local4/weights/ExponentialMovingAverage 
conv2/weights/ExponentialMovingAverage 
input_producer/limit_epochs/epochs 
local3/weights/ExponentialMovingAverage 
conv1/weights/ExponentialMovingAverage 

Traceback (most recent call last): 
    File "cnn_eval.py", line 148, in <module> 
    tf.app.run() 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/default/_app.py", line 30, in run 
    sys.exit(main(sys.argv)) 
    File "cnn_eval.py", line 144, in main 
    evaluate() 
    File "cnn_eval.py", line 119, in evaluate 
    saver = tf.train.Saver([v for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"]) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 784, in __init__ 
    restore_sequentially=restore_sequentially) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 437, in build 
    vars_to_save = self._ValidateAndSliceInputs(names_to_variables) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 340, in _ValidateAndSliceInputs 
    names_to_variables = self._VarListToDict(names_to_variables) 
    File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 314, in _VarListToDict 
    raise TypeError("Variable to save is not a Variable: %s" % var) 
TypeError: Variable to save is not a Variable: Tensor("Const:0", shape=(), dtype=string) 

EDIT 5 @mrry:

saver = tf.train.Saver([tf.Variable(0.0,validate_shape=False,name=v) for v in variables_to_restore if v != "input_producer/limit_epochs/epochs"]) 

0x21d0cb0 Compute status: Invalid argument: Assign requires shapes of both tensors to match. lhs shape= [] rhs shape= [10] 
    [[Node: save/Assign_8 = Assign[T=DT_FLOAT, use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](softmax_linear/biases/ExponentialMovingAverage, save/restore_slice_8/_20)]] 

TL; DR:cifar10_eval.py,变化这个保存器的构造函数是这样的:

saver = tf.train.Saver([v for v in variables_to_restore 
         if v != "input_producer/limit_epochs/epochs"]) 

这个问题是因为tf.train.string_input_producer()内部创建一个变量(称为"input_producer/limit_epochs/epochs")时,其num_epochs参数不是None。当在cifar10_eval.py a tf.train.Saver is created中,它使用tf.all_variables(),其包括来自tf.nn.string_input_producer()的隐式创建的变量。该变量列表确定TensorFlow在检查点文件中查找的名称集。

目前没有一种很好的方式来引用隐式创建的变量,而不是通过它们的名称。因此,最好的解决方法是按名称排除Saver构造函数中的变量。

+0

我已经相应改变tf.train.Saver,但它不工作。还有更多吗?或者我错过了什么。 – jkschin

+0

啊,对不起!你可以尝试在['cifar10_eval.py'](https://github.com/tensorflow/tensorflow/blob/77c2042e77a11ee442ecc7e369cd91d91e4a98c3/tensorflow/models/image/cifar10/cifar10_eval.py#L134)中进行相应的更改吗?您必须从'variables_to_restore'中排除隐式创建的变量。 – mrry

+0

您能否详细说明排除隐式创建的变量?我使用上面的那一行,它似乎不起作用。值得一提的是,我正在使用的cifar10代码(和tensorflow安装)未更新为主控上的代码。 – jkschin

消除隐变量"input_producer/limit_epochs/epochs"的另一种方法是只装载训练的变量:

saver = tf.train.Saver(tf.trainable_variables())