#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

论文题目: TabNet: Attentive Interpretable Tabular Learning
论文地址: https://arxiv.org/abs/1908.07442
论文发表于: arXiv 2019

论文大体内容:
本文主要提出了TabNet模型,能够高效地在tabular数据上完成分类/回归的任务,且具可解释性。本文提出的模型是用DNN的方式获得树模型的可解释性,且超越树模型的效果。

Motivation:
tabular数据一般都使用树模型去处理,怎么用DNN去实现树模型的效果,并也能获得模型的可解释性,是打破DNN黑盒子的一个方法。

Contribution:
TabNet有以下特性:
①直接使用raw feature,不需要手动feature selection,能够end2end生成;
②sequential attention去进行feature selection,而且是instance-wise(每个样本不一样);
③tabnet在分类和回归任务比其它模型效果好,且具可解释性,
包括每个特征当前的特征重要性(局部可解释性),每个特征对label的影响(全局可解释性)
④是第一个用于tabular数据的自监督学习模型;


1. DNN模型实现树模型的功能
①通过mask(feature selection);
②FC+ReLU,再加起来,最后过一层softmax,则可以得到如右图的分界;
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

2. TabNet的整体框架
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

3. 输入的feature:f ∈ R^(B*D),B是batch_size,D的feature的维度;

4. Feature transformer:包含4层,其中2层是共享的,2层是每一步独立的;

5. split:产出2个信息:d[i]和a[i],d[i]是用于输出预测结果的,a[i]是用于给后续的attention做输入的;
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

6. Attention transformer:产出M[i],代表Mask;
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

7. 最终产出结果
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

8. 自监督学习的做法:(类似用word2vec产出embedding,再进行找相似)
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

9. 可解释性的计算(特征重要性)
ηb衡量第b个样本输出的重要性,M衡量特征重要性;
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning


实验
本文做的实验比较多,主要关注下面几个点:
①效果与其它模型比,是否更好;
②特征选择是否能实现;
③在自监督学习上的表现;

10. Dataset
人工数据集;
Poker Hand;
Higgs Boson;
KDD数据集;
Adult Census Income;
等等;

11. Baseline
Lasso;
xgboost;
lightgbm;
catboost;
MLP;
等等;

12. Metric
ACC;
MSE;
AUC;
等等;

13. 实验结果
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning
#Paper Reading# TabNet: Attentive Interpretable Tabular Learning

#Paper Reading# TabNet: Attentive Interpretable Tabular Learning


参考资料: 
[1] https://github.com/dreamquark-ai/tabnet
[2] https://github.com/google-research/google-research/tree/master/tabnet
[3] https://zhuanlan.zhihu.com/p/126755362


以上均为个人见解,因本人水平有限,如发现有所错漏,敬请指出,谢谢!