FaceNet论文翻译学习4

原文链接:FaceNet: A Unified Embedding for Face Recognition and Clustering

6. 总结(Summary)

We provide a method to directly learn an embedding into an Euclidean space for face verification. This sets it apart from other methods [15, 17] who use the CNN bottleneck layer, or require additional post-processing such as concatenation of multiple models and PCA, as well as SVM classification. Our end-to-end training both simplifies the setup and shows that directly optimizing a loss relevant to the task at hand improves performance.
我们提供了一种直接学习嵌入(embedding)欧几里德空间进行人脸验证的方法。 这使得它与使用CNN瓶颈层的其他方法[15,17]不同,或者需要额外的后处理,例如多个模型和PCA的串联,以及SVM分类。 我们的端到端训练既简化了设置,又表明直接优化与手头任务相关的损失(loss)可以提高性能。

Another strength of our model is that it only requires minimal alignment (tight crop around the face area). [17], for example, performs a complex 3D alignment. We also experimented with a similarity transform alignment and notice that this can actually improve performance slightly. It is not clear if it is worth the extra complexity.
我们模型的另一个优势是它只需要最小的对齐(面部周围紧密的裁剪)。 (其他模型)例如[17]执行复杂的3D对齐。 我们还尝试了相似性变换对齐,并注意到这实际上可以略微提高性能。 目前尚不清楚是否值得(为此增加)额外的复杂性。

Future work will focus on better understanding of the error cases, further improving the model, and also reducing model size and reducing CPU requirements. We will also look into ways of improving the currently extremely long training times, e.g. variations of our curriculum learning with smaller batch sizes and offline as well as online positive and negative mining.
未来的工作将侧重于更好地理解错误情况,进一步改进模型,并减少模型大小和降低CPU要求。 我们还将探讨如何改善目前极长的训练时间,例如: 我们的课程学习的变化与较小的batch和离线以及在线positive和negative的挖掘。

7. 附录:谐波嵌入(Appendix: Harmonic Embedding)

In this section we introduce the concept of harmonic embeddings. By this we denote a set of embeddings that are generated by different models v1 and v2 but are compatible in the sense that they can be compared to each other.
在本节中,我们将介绍谐波嵌入的概念。 通过这个,我们表示由不同模型v1和v2生成的一组嵌入(embedding),但是在它们可以彼此比较的意义上是兼容的。

This compatibility greatly simplifies upgrade paths. E.g. in an scenario where embedding v1 was computed across a large set of images and a new embedding model v2 is being rolled out, this compatibility ensures a smooth transition without the need to worry about version incompatibilities. Figure 8 shows results on our 3G dataset. It can be seen that the improved model NN2 significantly outperforms NN1, while the comparison of NN2 embeddings to NN1 embeddings performs at an intermediate level.
这种兼容性极大地简化了升级路径。 例如。 在一个大型图像集中计算嵌入v1并且正在推出新的嵌入模型v2的情况下,这种兼容性可确保平滑过渡,而无需担心版本不兼容。 图8显示了我们的3G数据集的结果。 可以看出,改进的模型NN2明显优于NN1,而NN2嵌入与NN1嵌入的比较在中间水平上执行。
FaceNet论文翻译学习4
Figure 8. Harmonic Embedding Compatibility.
These ROCs show the compatibility of the harmonic embeddings of NN2 to the embeddings of NN1. NN2 is an improved model that performs much better than NN1. When comparing embeddings generated by NN1 to the harmonic ones generated by NN2 we can see the compatibility between the two. In fact, the mixed mode performance is still better than NN1 by itself.
图8.谐波嵌入兼容性。
这些ROC显示了NN2的谐波嵌入与NN1嵌入的兼容性。 NN2是一种改进的模型,其性能远优于NN1。 当比较NN1生成的嵌入与NN2生成的谐波嵌入时,我们可以看到两者之间的兼容性。 实际上,混合模式性能本身仍然优于NN1。

7.1. 谐波三元组损失(Harmonic Triplet Loss)

In order to learn the harmonic embedding we mix embeddings of v1 together with the embeddings v2, that are being learned. This is done inside the triplet loss and results in additionally generated triplets that encourage the compatibility between the different embedding versions. Figure 9 visualizes the different combinations of triplets that contribute to the triplet loss.
为了学习谐波嵌入,我们将v1的嵌入与v2的嵌入混合,这是正在学习的。 这是在元组损失内部完成的,并导致额外生成的三元组,这些三元组促进了不同嵌入版本之间的兼容性。 图9显示了导致三元组损失的三元组的不同组合。
FaceNet论文翻译学习4
Figure 9. Learning the Harmonic Embedding.
In order to learn a harmonic embedding, we generate triplets that mix the v1 embeddings with the v2 embeddings that are being trained. The semihard negatives are selected from the whole set of both v1 and v2 embeddings.
图9.学习谐波嵌入。 为了学习谐波嵌入,我们生成三元组,将v1嵌入与正在训练的v2嵌入混合。 从整个v1和v2嵌入集合中选择 semihard negative样本。

We initialized the v2 embedding from an independently trained NN2 and retrained the last layer (embedding layer) from random initialization with the compatibility encouraging triplet loss. First only the last layer is retrained, then we continue training the whole v2 network with the harmonic loss.
我们从独立训练的NN2初始化v2嵌入,并从随机初始化中重新训练最后一层(嵌入层),兼容性鼓励三元组损失。 首先只重新训练最后一层,然后我们继续训练整个v2网络的谐波损失。

Figure 10 shows a possible interpretation of how this compatibility may work in practice. The vast majority of v2 embeddings may be embedded near the corresponding v1 embedding, however, incorrectly placed v1 embeddings can be perturbed slightly such that their new location in embedding space improves verification accuracy.
图10显示了这种兼容性在实践中如何起作用的可能解释。 绝大多数v2嵌入可以嵌入在相应的v1嵌入附近,然而,错误放置的v1嵌入可以稍微扰动,使得它们在嵌入空间中的新位置提高了验证准确性。
FaceNet论文翻译学习4
Figure 10. Harmonic Embedding Space.
This visualisation sketches a possible interpretation of how harmonic embeddings are able to improve verification accuracy while maintaining compatibility to less accurate embeddings. In this scenario there is one misclassified face, whose embedding is perturbed to the “correct” location in v2.
图10.谐波嵌入空间。
该可视化概述了谐波嵌入如何在提高验证准确性的同时保持与不太精确的嵌入的兼容性的可能解释。 在这种情况下,有一个错误分类的人脸,其嵌入被扰乱到v2中的“正确”位置。

7.2. 总结(Summary)

These are very interesting findings and it is somewhat surprising that it works so well. Future work can explore how far this idea can be extended. Presumably there is a limit as to how much the v2 embedding can improve over v1, while still being compatible. Additionally it would be interesting to train small networks that can run on a mobile phone and are compatible to a larger server side model.
这些都是非常有趣的发现,它有点令人惊讶,它运作良好。 未来的工作可以探索这个想法可以扩展到多远。 据推测,v2嵌入可以比v1提高多少,但仍然兼容。 另外,训练可以在移动电话上运行并且与更大的服务器端模型兼容的小型网络将是有趣的。

致谢(Acknowledgments)

We would like to thank Johannes Steffens for his discussions and great insights on face recognition and Christian Szegedy for providing new network architectures like [16] and discussing network design choices. Also we are indebted to the DistBelief [4] team for their support especially to Rajat Monga for help in setting up efficient training schemes.
我们要感谢Johannes Steffens关于人脸识别的讨论和深刻见解,以及Christian Szegedy提供的新网络架构例如[16]和讨论网络设计选择。 此外,我们感谢DistBelief [4]团队的支持,尤其是对Rajat Monga的帮助建立有效的训练计划。

Also our work would not have been possible without the support of Chuck Rosenberg, Hartwig Adam, and Simon Han.
如果没有Chuck Rosenberg,Hartwig Adam和Simon Han的支持,我们的工作也是不可能的。