论文阅读 Prototypical Networks for Few-shot Learning

Abstract

We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space (度量空间) in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias(归纳偏差) that is beneficial in this limited-data regime(制度), and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-the- art results on the CU-Birds dataset.

1 Introduction

Prototypical networks, is based on the idea that there exists an embedding in which points cluster around a single prototype representation for each class. In order to do this, we learn a non-linear mapping of the input into an embedding space using a neural network and take a class’s prototype to be the mean of its support set in the embedding space. Classification is then performed for an embedded query point by simply finding the nearest class prototype.

原型网络(Prototypical networks) 将学习一 个度量空间, 在这个度量空间内, 分类器可以根据样本到类别原型间的距离, 来对样本进行分类. 每个类别原型, 可以通过对每个类别中所有样本在度量空间的向量求平均得到, 使用欧式距离来判断样本所属的类别.

We follow the same approach to tackle zero-shot learning; here each class comes with meta-data giving a high-level description of the class rather than a small number of labeled examples. We therefore learn an embedding of the meta-data into a shared space to serve as the prototype for each class.

我们采用相同的方法来处理零样本学习;在这里,每个类都有元数据,提供对类的高级描述,而不是少量带标签的示例。因此,我们学习将元数据嵌入到共享空间中,作为每个类的原型.

In particular, we relate prototypical networks to clustering in order to justify the use of class means as prototypes when distances are computed with a Bregman divergence,such as squared Euclidean distance. We find empirically that the choice of distance is vital, as Euclidean distance greatly outperforms the more commonly used cosine similarity.

特别是,我们将原型网络与聚类联系起来,以证明当使用Bregman散度(如平方欧几里得距离)计算距离时,将类方法用作原型是合理的。我们从经验上发现,距离的选择是至关重要的,因为欧几里得距离大大优于更常用的余弦相似性。

2 Prototypical Networks

2.1 Algorithm

论文阅读 Prototypical Networks for Few-shot Learning

2.2 Choices for Distance Metric

A regular Bregman divergence dφ is defined as:

论文阅读 Prototypical Networks for Few-shot Learning
where φ is a differentiable, strictly convex function of the Legendre type. Examples of Bregman
divergences include squared Euclidean distance ∥z − z′∥2 and Mahalanobis distance.

It has been shown for Bregman divergences that the cluster representative achieving minimal distance to its assigned points is the cluster mean. Thus the prototype computation yields optimal cluster representatives given the support set labels when a Bregman divergence is used.

对Bregman divergence来说,集团内各点的集团平均( cluster mean)一定在是空间中与这些点的平均距离最小的点。所以作者在提取“原型”时采用了符合Bregman divergence的距离计算方式(欧几里得距离、马氏距离、任意自然指数族分布),以保证集团平均( cluster mean) 是空间中与各点的平均距离最小的点,即cluster mean是最佳集团表示。 在本论文中使用squared Euclidean distance。

2.3 Choices for Episodic Learning Parameters

Episode composition
A straightforward way to construct episodes is to choose Nc classes and NS support points per class in order to match the expected situation at test-time. That is, if we expect at test-time to perform 5-way classification and 1-shot learning, then training episodes could be comprised of Nc = 5, NS = 1. We have found, however, that it can be extremely beneficial to train with a higher Nc, or “way”, than will be used at test-time. In our experiments, we tune the training Nc on a held-out validation set. Another consideration is whether to match NS , or “shot”, at train and test-time. For prototypical networks, we found that it is usually best to train and test with the same “shot” number.

构造episode时,可以直接选择Nc类和每个类的NS支持点,以匹配测试时的预期情况。但是作者发现,更多的Nc值(更多的way)能带来明显更好的效果,所以作者根据保留的验证集调整训练Nc。另一个考虑因素是在训练和测试时应具有相匹配的NS(shot),对于原型网络,作者发现通常使用相同的“shot”进行训练和测试效果更好。

3 Experiments

3.1 Omniglot Few-shot Classification

论文阅读 Prototypical Networks for Few-shot Learning

3.2 miniImageNet Few-shot Classification

论文阅读 Prototypical Networks for Few-shot Learning
论文阅读 Prototypical Networks for Few-shot Learning

3.3 CUB Zero-shot Classification

论文阅读 Prototypical Networks for Few-shot Learning

4 Conclusion

We have proposed a simple method called prototypical networks for few-shot learning based on the idea that we can represent each class by the mean of its examples in a representation space learned by a neural network. We train these networks to specifically perform well in the few-shot setting by using episodic training. The approach is far simpler and more efficient than recent meta-learning approaches, and produces state-of-the-art results even without sophisticated extensions developed for matching networks (although these can be applied to prototypical nets as well). We show how performance can be greatly improved by carefully considering the chosen distance metric, and by modifying the episodic learning procedure. We further demonstrate how to generalize prototypical networks to the zero-shot setting, and achieve state-of-the-art results on the CUB-200 dataset. A natural direction for future work is to utilize Bregman divergences other than squared Euclidean distance, corresponding to class-conditional distributions beyond spherical Gaussians. We conducted preliminary explorations of this, including learning a variance per dimension for each class. This did not lead to any empirical gains, suggesting that the embedding network has enough flexibility on its own without requiring additional fitted parameters per class. Overall, the simplicity and effectiveness of prototypical networks makes it a promising approach for few-shot learning.

原型网络的简单性和有效性使它成为小样本学习的有前途的方法,而且合适的距离度量方式和Episodic参数(way和shot的数量)会极大地提高性能。

Reference

[1]: Snell J, Swersky K, Zemel R. Prototypical networks for few- shot learning. In Advances in Neural Information Processing Systems, Long Beach:MIT Press, 2017. 4077−4087
[2]: 论文阅读(六)Prototypical Networks for Few-shot Learning

Code

[1]: jakesnell/prototypical-networks