KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors

On the Robustness of Deep K-Nearest Neighbors

1. 摘要

近年来有许多关于网络攻击的模型，但是对于如何有效地防御攻击却鲜有研究。Deep k-Nearest Neighbor （DkNN）是一个防御模型，它结合了 KNN 与 deeping learning 的方法。关于 KNN ，当 K 很大或者数据维度很高时，由于缺少有效的攻击方法，很难去评估 DkNN 模型的鲁棒性。本文提出了一种启发式攻击算法，它通过梯度下降来寻找对抗样本，并用这些样本来攻击 DkNN 模型。实验结果表明，在 KNN 的攻击上，本文的攻击模型比其他 baseline 较好，在 DkNN 的攻击上，本文的攻击模型比其他的 baseline 好出一大截儿。

2. Introduction

深度学习在诸多领域中都取得了巨大的成功，但是许多研究表明，神经网络在攻击样本面前非常脆弱。我们急需更加鲁棒且可解释的神经网络模型。
We develop a new gradient-based attack on KNN and DkNN. gradient-based attack 在神经网络的攻击中很常见，应用在 KNN 上的难点是 KNN 是不可导的。本文的思想是用一个可导函数来近似原来的目标函数，然后利用梯度下降法求导。
本文的贡献如下：

提出了 a gradient-based attack on KNN and DkNN。
对 KNN 与 DkNN 模型进行了攻击，发现本文的攻击模型优于一些 baseline 模型。
DkNN 模型提出的 credibility scores 并不能够识别出咱们的攻击样本。

3. 背景

1) 什么是攻击样本？
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
2) KNN 的鲁棒性
没啥内容
3) Deep k-Nearest Neighbors
DkNN 适应于任意的深度学习模型，offering inter- pretability and robustness through a nearest neighbor search in each of the deep representation layers. 模型利用 credibility score 描述 how likely its prediction is to be correct. 对抗样本 credibility score 很低，因此很容易被检测出来。

4. 模型

文本的攻击为白盒攻击：对于 DkNN，可以看到训练集以及模型参数；对于 KNN，可以看到训练集、 K、距离度量。

4.1 KNN 的攻击

Notation：
$z$ ：target sample。
$y_z$ ：z’s ground-truth label。
$\hat{z}$ ：perturbed version of $z$ 。
$(X,Y)$ : training set。 $X \in R^{n \times d}$ 。
$y_{adv}$ : class of $\hat{z}$ 。
Baselines：
mean attack： First search for the class $y_{adv} \neq y_z$ such that the mean of training samples with that class is closest to $z$ in Euclidean distance. Let $m$ denote the corresponding mean. Then use binary search to find the smallest $c > 0$ such that $(1 − c)z + cm$ is misclassified by the kNN. 缺点：对攻击样本的扰动太明显。
naive attack： 对于每一个训练样本 $z'(y_{z'} \neq y_z)$ ，使 $\hat{z} = (1 − c)z + cz'$ ，找到最小的 $c$ 使得 $knn(\hat{z} )\neq y_z$ ，由此可产生 $n$ 个对抗样本，与 $z$ 距离的，作为最终的对抗样本。论文的作者说，对于 1-NN 模型可找到全局最优解，实际上不是这样的。一个反例如下图：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
对于 K(>1)-NN，使用下面的算法：
(1) 找到距离 $z$ 最近的点 $z'(y_{z'} \neq y_z)$ ；(2) 将 $z$ 添加到集合 $S$ 中；(3) 对于所有类别 $y_{z'}$ 下的点，迭代地找出距离 $S$ 的均值最近的点，并加入到 $S$ 中。算法直到 $|S| = \frac{k}{2}$ 时终止。
Gradient-Based Attack：
算法的总体流程：首先选择距离 $z$ 较近的 $m$ 个点。然后使用基于梯度的优化算法 move $z$ closer to the ones with the target class $y_{adv}$ and further from the ones with the original class $y_z$ 。
如何选择 $m$ 个点：两种方法：(1) 选择属于同一类的 $m$ 个样本（这个类别下的样本均值距离 $z$ 比较近）；(2) 选择距离 $z$ 最近的 $m$ 个样本。
要解决的问题是：对 $z$ 添加扰动 $\delta$ ，使得 $knn (z + \delta) = y_{adv}$ 。因此优化目标可以是 $z$ 的 K 个近邻都属于类别 $y_{adv}$ ，或者说对于 $m$ 个点， $z$ 要尽量与类别 $y_{adv}$ 下的点距离近，而与其他类别下的点距离远。
优化目标：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
$\hat{z} = z + \delta$ 是对抗样本。 $w_i = 1$ if $y_{x_i}=y_{adv}$ else $0$ 。第一个约束条件限制了扰动 $\delta$ 的 norm，第二个约束条件是对于图片的像素点来说取值要在 $[0,1]$ 之间。
但是上述优化目标有一些缺点。(1) 将所有的 $x_i$ 同等地看待。(2) 对于 KNN 模型来说，重要的是 K 近邻，其他的样本点并不重要。(3) K 近邻重要，而距离 K 近邻的距离并不重要。
本文提出的改进优化目标：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors

$\sigma(x)$ 函数用来近似 $y=1$ if $x \ge 0$ else $0$ 函数。当 $\alpha \rightarrow \infty$ 时， $\sigma(x)$ 函趋向于 Heaviside step function。 $\eta$ 是 $z+\delta$ 到第 k 个邻居的距离。 $\eta$ 需要每个优化步骤中都计算一次，为了简化计算，使用所有样本到其第 k 个近邻的距离的均值作为 $\eta$ 。
改进后的优化目标：只要使得 $x \in y_{adv}$ 在 $z + \delta$ 的 k 近邻中就行了，其他的点如果也在 k 近邻中，就要使它们和 $z$ 的距离越大越好。
p-norm 的选择：本文另 $p=2$ 或 $p=\infty$ 。
$p = \infty$ ： 限制条件 $z + \delta \in [0,1]$ 可转化为：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
$b_u$ 和 $b_l$ 分别是像素点值的上界和下界。
$p = 2$ ：

将约束条件作为惩罚项加入到目标函数中。（ $c$ 值的确定：通过二分法，如果攻击成功则增加 $c$ 值，如果攻击失败，则减小 $c$ 值。）

4.2 DkNN 的攻击

Notation：
$f_{(x)}$ ：DkNN 的预测输出。
$f_\lambda(x)$ ：DkNN 模型 $\lambda$ 层的输出。
mean attack： 与 KNN 的攻击模型相同。
baseline attack：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
$x_g$ is a sample from a different class whose representation is closest to $f_1(z)$ 。
Gradient-Based Attack：

5. 实验

KNN 模型的攻击结果：
KNN鲁棒性训练之On the Robustness of Deep K-Nearest Neighbors
DkNN 模型的攻击结果：