解决错误 RuntimeError: cuda runtime error (710) : device-side assert triggered a
在github上看别人的代码,用别人的数据集跑通了,满心欢喜的换自己的数据集,修改了一番后,发现遇到了莫名其妙的错误,如下
Traceback (most recent call last):
File "train_discriminator.py", line 167, in <module>
main()
File "train_discriminator.py", line 105, in main
loss_D.backward()
File "//anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 166, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cuda/SoftMax.cu:647
这里的log里似乎没有什么关键信息,搜了一下,说切换到cpu运行就可以看到一些信息了
切换到cpu,遇到如下错误
File "train_discriminator.py", line 167, in <module>
main()
File "train_discriminator.py", line 100, in main
loss_s = F.cross_entropy(y_disc_real, labels)
File "/python3.7/site-packages/torch/nn/functional.py", line 2009, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1838, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THNN/generic/ClassNLLCriterion.c:97
这样我就想到原因了,是因为使用pytorch的torchtext加载数据集,其中测试时遇到了在训练集中没有的label,这样就造成了错误,提醒我们要注意分层抽样,特别是在某一类别的数量特别少时。