解决错误 RuntimeError: cuda runtime error (710) : device-side assert triggered a

在github上看别人的代码，用别人的数据集跑通了，满心欢喜的换自己的数据集，修改了一番后，发现遇到了莫名其妙的错误，如下

Traceback (most recent call last):
File "train_discriminator.py", line 167, in <module>
    main()
File "train_discriminator.py", line 105, in main
    loss_D.backward()
File "//anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 166, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuda runtime error (710) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/ATen/native/cuda/SoftMax.cu:647

这里的log里似乎没有什么关键信息，搜了一下，说切换到cpu运行就可以看到一些信息了

切换到cpu,遇到如下错误

File "train_discriminator.py", line 167, in <module>
    main()
File "train_discriminator.py", line 100, in main
    loss_s = F.cross_entropy(y_disc_real, labels)
File "/python3.7/site-packages/torch/nn/functional.py", line 2009, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py", line 1838, in nll_loss
    ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /opt/conda/conda-bld/pytorch_1573049306803/work/aten/src/THNN/generic/ClassNLLCriterion.c:97

这样我就想到原因了，是因为使用pytorch的torchtext加载数据集，其中测试时遇到了在训练集中没有的label,这样就造成了错误，提醒我们要注意分层抽样，特别是在某一类别的数量特别少时。

解决错误 RuntimeError: cuda runtime error (710) : device-side assert triggered a

相关推荐