英语论文2

5 ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY
Perhaps the simplest possible model we can consider is logistic regression. In this case, the fast gradient sign method is exact. We can use this case to gain some intuition for how adversarial examples are generated in a simple setting. See Fig. 2 for instructive images.

If we train a single model to recognize labels y ∈ {−1, 1} with P (y = 1) = σ (wTx + b)where σ(z) is the logistic sigmoid function, then training consists of gradient descent on

Ex,y∼pdataζ(−y(w x + b))
where ζ(z) = log (1 + exp(z)) is the softplus function. We can derive a simple analytical form for training on the worst-case adversarial perturbation of x rather than x itself, based on gradient sign


1 This is using MNIST pixel values in the interval [0, 1]. MNIST data does contain values other than 0 or 1, but the images are essentially binary. Each pixel roughly encodes “ink” or “no ink”. This justifies expecting the classifier to be able to handle perturbations within a range of width 0.5, and indeed human observers can read such images without difficulty.
2 See https://github.com/lisalab/pylearn2/tree/master/pylearn2/scripts/
papers/maxout. for the preprocessing code, which yields a standard deviation of roughly 0.5.

英语论文2

Figure 2: The fast gradient sign method applied to logistic regression (where it is not an approximation, but truly the most damaging adversarial example in the max norm box). a) The weights of a logistic regression model trained on MNIST. b) The sign of the weights of a logistic regression model trained on MNIST. This is the optimal perturbation. Even though the model has low capacity and is fit well, this perturbation is not readily recognizable to a human observer as having anything to do with the relationship between 3s and 7s. c) MNIST 3s and 7s. The logistic regression model has a 1.6% error rate on the 3 versus 7 discrimination task on these examples. d) Fast gradient sign adversarial examples for the logistic regression model with = .25. The logistic regression model has an error rate of 99% on these examples.

perturbation. Note that the sign of the gradient is just −sign(w), and that wTsign(w) = ||w||1.The adversarial version of logistic regression is therefore to minimize

Ex,y∼pdataζ(y( ||w||1 − wTx − b)).
This is somewhat similar to L regularization. However, there are some important differences. Most significantly, the L1 penalty is subtracted off the model’s activation during training, rather than added to the training cost. This means that the penalty can eventually start to disappear if the model
learns to make confident enough predictions that ζ saturates. This is not guaranteed to happen—in the underfitting regime, adversarial training will simply worsen underfitting. We can thus view L1 weight decay as being more “worst case” than adversarial training, because it fails to deactivate in
the case of good margin.

If we move beyond logistic regression to multiclass softmax regression, L1 weight decay becomes even more pessimistic, because it treats each of the softmax’s outputs as independently perturbable, when in fact it is usually not possible to find a single η that aligns with all of the class’s weight vectors. Weight decay overestimates the damage achievable with perturbation even more in the case of a deep network with multiple hidden units. Because L1 weight decay overestimates the amount of damage an adversary can do, it is necessary to use a smaller L1 weight decay coefficient than
the associated with the precision of our features. When training maxout networks on MNIST, we obtained good results using adversarial training with = .25. When applying L1 weight decay to the first layer, we found that even a coefficient of .0025 was too large, and caused the model to get stuck with over 5% error on the training set. Smaller weight decay coefficients permitted succesful training but conferred no regularization benefit.