论文阅读:神经网络的有趣性质(Intriguing Properties Of Neural Network)

这篇文章提出对抗样本(Adversarial example)这个坑。所谓对抗样本,就是在原样本上引入细微扰动,就能使得模型出现误分类的情况,这也是对抗生成网络的启发思想



Adversarial Example expose fundamental blind spots in our training algorithm.



1.There is no distinctionbetween individual high level units(activated) and random linear combination ofhigh level units

【 It suggests that itis the space, rather than the individual units, that contains the semanticinformation in the high layers of neural networks. 】

It'sthe entire space of activations, rather than the individual units, thatcontains the bulk of the semantic information.【这里做出的解释是用word embedding的例子,单个词向量不能表示所包含的语义关系,词向量的空间信息,即与其他词向量的空间关系才真正蕴含一个词的意思,原理如PageRankembedding的空间进行旋转依旧能保留原来的本义信息,向量本身也已经改变了。数据进行再编码后,要理解该编码的信息需要把这个信息放在编码空间里面】



不过这里有个trick是说,这里说的语义信息的分析,是通过把一堆图片丢进去,然后找出同时能够让某一单元**值最大的图片,再对这些图片进行分析它们的共同特征,然后就认为说这个units学到了这些特征。这个方法也叫unit-levelinspection methods

2.Deep neutralnetwork learn input-output mappings that are fairly discontinuous to asignificant extent. A certain perturbation can cause the network misclassify animage.值得注意的是同样的扰动在其他数据集训练的模型上依旧能够保持这样的效果

The outputlayer unit of a neural network is a highly nonlinear function of its input


Smoothnessassumption(细微的扰动对模型分类结果影响不大)that underlies many kernel methods does not hold. 其实大多数的随机情况下,还是能够做到这一点的,It's hard to efficiently find by simply randomly sampling the inputaround a given example.(数据没有显式的表明,多小的perturbation能够进行localgeneralizationIt has beenargued that the deep stack of non-linear layers in between the input and theoutput unit of a neural network are a way for the model to encode a non-localgeneralization prior over the input space【Learning deeparchitectures for ai. Foundations and trends® in Machine Learning, 2(1):1–127,2009.】),甚至有模型employ input deformations during training for increasing therobustness and convergence speed of the model9,13


因为大多数的perturbation并不能实现AdversarialExample的效果,即不能使网络出现错误的分类,那么就需要一种有效的方法来找到Adversarial Example



  1. For all the network we studied(MNIST, QuoceNet, AlexNet),can always managed to generate adversarial examples.
  2. Cross model generalization:相当一部分的Adversarial Example在其他超参的网络上依然会misclassify(超参包括number of layers, regularization or initial weights,故而dropout等泛化操作不能解决该问题)
  3. Cross train-set generalization:相当一部分的Adversarial Example在通过其他数据集上训练得到的网络上依然会misclassify

以上三个特性表明对抗样本不是特定模型过拟合或特定数据集合呈现的结果:Adversarial examples are somewhat universaland not just the results of overfitting to a particular model or to thespecific selection of the training set


Theyalso suggest that back-feeding adversarial examples to training might improvegeneralization of the resulting models.

A subtle, butessential detail is that we only got improvements by generating adversarialexamples for each layer outputs which were used to train all the layers above.




文中最后提出对抗样本的出现意味着目前神经网络在泛化性能上存在缺陷,不过对抗样本一般不可能出现在测试集甚至在现实生活中。也提到对对抗样本的出现仍未有一种较好的解释,故而留下了一个坑解释这个问题“However,we don’t have a deep understanding of how often adversarial negatives appears,and thus this issue should be addressed in a future research.”

