摘要

AFM全称是Attentional Factorization Machine，和NFM是同一个作者，NFM的解析可以参考这篇博客：https://blog.csdn.net/weixin_45459911/article/details/105702981
AFM是在FM上的改进，它最大的特点就是使用一个attention network来学习不同组合特征的重要性。

算法原理

1、基本原理

基本原理非常简单，就是FM+attention机制。
说人话就是：给FM中的嵌入向量加一个权重。
原有的FM公式为：
论文阅读：（AFM）Learning the Weight of Feature Interactions via Attention Networks
AFM公式为：

2、网络结构

论文阅读：（AFM）Learning the Weight of Feature Interactions via Attention Networks
Attention机制的核心思想在于：当把不同的部分压缩在一起的时候，让不同的部分的贡献程度不一样。AFM通过在Interacted vector后增加一个weighted sum来实现Attention机制。
论文阅读：（AFM）Learning the Weight of Feature Interactions via Attention Networks

公式中的aij是Attention score，表示不同的组合特征对于最终的预测的贡献程度。可以看到：

Attention-based Pooling Layer的输入是Pair-wise Interaction Layer的输出。它包含m(m-1)/2个向量，每个向量的维度是k。（k是嵌入向量的维度，m是Embedding Layer中嵌入向量的个数）
Attention-based Pooling Layer的输出是一个k维向量。它对Interacted vector使用Attention score进行了weighted sum pooling操作。

那么aij是怎么更新的呢？

Attention score的学习是一个问题。一个常规的想法就是随着最小化loss来学习，但是这样做对于训练集中从来没有一起出现过的特征组合的Attention score无法学习。
所以AFM用一个Attention Network来学习。

Attention network实际上是一个one layer MLP，**函数使用ReLU，网络大小用attention factor表示，就是神经元的个数。

Attention network的输入是两个嵌入向量element-wise product之后的结果(interacted vector，用来在嵌入空间中对组合特征进行编码)；它的输出是组合特征对应的Attention score。最后，使用softmax对得到的Attention score进行规范化，Attention Network形式化如下：

论文阅读：（AFM）Learning the Weight of Feature Interactions via Attention Networks

3、过拟合

防止过拟合常用的方法是Dropout或者L2 L1正则化。AFM的做法是：

在Pair-wise Interaction Layer的输出使用Dropout
在Attention Network中使用L2正则化

参考：
https://blog.csdn.net/u010352603/article/details/82670349

论文阅读：（AFM）Learning the Weight of Feature Interactions via Attention Networks

文章目录

摘要

算法原理

1、基本原理

2、网络结构

3、过拟合

相关推荐