big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

原文：Deep Learning On Code with an Unbounded Vocabulary

作者：Milan Cvitkovic

单位：加州理工学院（Caltech, California Institute of Technology）、Amazon AI

会议：EasyChair 2018

模型

big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

讲源代码转成AST
在AST的基础上加各种边，如数据流，控制流
(本文重点)变量的结点和subtoken之间加边
用GGNN训练

效果

FILL-IN-THE-BLANK

$\begin{array}{cl|ccc} \hline & & \text{Fixed Vocab } & \text{CharCNN Only } & \text{Graph Vocab (ours) }\\ \text{Unseen files from seen repos } & \text{AST } & 0.58 & 0.60 & 0.89\\ \hline & \text{Augmented AST } & 0.80 & 0.90 & {\boldsymbol{\mathbf{0 . 9 7}}}\\ \text{Entirely unseen repos } & \text{AST } & 0.36 & 0.48 & 0.80\\ & \text{Augmented AST } & 0.59 & 0.84 & {\boldsymbol{\mathbf{0 . 9 2}}} \end{array}$

Variable Naming

$\begin{array}{cl|ccc} \hline & & \text{Fixed Vocab } & \text{CharCNN Only } & \text{Graph Vocab (ours) }\\ \text{Unseen files from seen repos } & \text{AST } & 0.23 (7.22) & 0.22 (8.67) & 0.49 (3.87)\\ \hline & \text{Augmented AST } & 0.19 (7.64) & 0.20 (7.46) & {\boldsymbol{\mathbf{0 . 5 3} (\mathbf{3 . 6 8})}}\\ \text{Entirely unseen repos } & \text{AST } & 0.05 (8.66) & 0.06 (8.82) & 0.38 (4.81)\\ & \text{Augmented AST } & 0.04 (8.34) & 0.06 (8.16) & {\boldsymbol{\mathbf{0 . 4 1 (4 . 2 8)}}} \end{array}$

小结

又一个加边狂魔。

论文里挂的GitHub，点开是没有的，所以看不了代码。

在subtoken和ast node之间加个边也是很不错的

这个论文要是数据集用的是微软的那个java-small就好了，可以好好比一下，可惜是自己爬的。

参考

数据集
big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

big code: Deep Learning On Code with an Unbounded Vocabulary [EasyChair 2018]

模型

效果

FILL-IN-THE-BLANK

Variable Naming

小结

参考

相关推荐