软工学者Baishakhi Ray及其顶会论文解读

前言

本文旨在介绍软工大牛Baishakhi Ray及其FSE 2012 顶会论文“A Case Study of Cross-system Porting in Forked Projects”

一、论文及作者信息

论文名称:A Case Study of Cross-system Porting in Forked Projects
作者:Baishakhi Ray(一作), Miryung Kim
当时的单位(2012年):The University of Texas at Austin
当时的联系方式:[email protected] [email protected]

作者Baishakhi Ray的信息:
其主页:https://engineering.virginia.edu/faculty/baishakhi-ray

教育经历:

B.S. ​University of Calcutta, India, 2004
M.S. ​University of Colorado, Boulder, USA, 2009
Ph.D. ​University of Texas at Austin, USA, 2013
Post-Doc ​University of California, Davis, USA, 2013-2015

现在的联系方式:​​[email protected] 电话(434-982-2212)

研究方向:

“I focus on improving software quality and the efficiency of the software development process to ensure a sustainable growth of the software industry.”
1) Detecting & Fixing Bugs Using Code Similarity
2) Testing Machine Learning-Based Systems
3) Natural Language Models for Source Code
4) Analytical Support for Improving Software Reliability
5) Software Engineering
6) Machine Learning, Text Mining, Information Retrieval

这里值得注意啊!!!

这位作者第一个研究兴趣就是detecting and fixing bugs using code similarity。

出版文献:
每年都有顶会,太强了哇。

1.“Automatically Diagnosing and Repairing Error Handling Bugs in C”. In Proceedings of the ACM SIGSOFT 25th International Symposium on the Foundations of Software Engineering (FSE 2017)
Y. TIAN, B. RAY
2. APEx: Automated inference of error specifications for C APIs. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (pp. 472-482) (ASE 2016)
3. “Assert Use in GitHub Projects”. In Proceedings of the International Conference on Software Engineering. IEEE. 2015, pp. 1–10, (ICSE 2015).
4. “A large scale study of programming languages and code quality in github”. In Proceedings of the ACM SIGSOFT 22nd International Symposium on the Foundations of Software Engineering, pp. 155–165, (FSE 2014).
B. RAY, D. POSNETT, V. FILKOV, AND P. T. DEVANBU.
5. “Detecting and characterizing semantic inconsistencies in ported code”. In: Automated Software Engineering, 2013 IEEE/ACM 28th International Conference on, pp. 367–377, (ASE 2013).
B. RAY, M. KIM, S. PERSON, AND N. RUNGTA.
6. “A Case Study of Cross-system Porting in Forked Projects”. In: Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pp. 53.1–53.11 (FSE 2012).
B. RAY AND M. KIM.

软工学者Baishakhi Ray及其顶会论文解读

Baishakhi Ray

二、论文内容

背景:

It has become increasingly common to create a variant software product or to introduce a new feature by copying code fragments from similar software products. For example, FreeBSD, OpenBSD, and NetBSD evolved from the same codebase, OpenSSH originated from SSH, LibreOffice originated from OpenOffice.org, etc. As copying code fragments across products is common, there are names referring to this process: forking|copying an existing product to create a slightly different product and porting|copying an existing feature implementation or bug fix to another member of the same product family. Software forking is often considered an ad hoc, low-cost alternative to principled product line development [26].

forking是复制,porting移植。

问题:

forking has negative implications during software maintenance. It duplicates development effort and requires developers to port similar bug fixes and
feature implications across forked projects [26].
[26] E. S. Raymond. The cathedral and the bazaar. Sebastopol, CA, USA, 1999. O’Reilly & Associates, Inc.

会带来同样的bug。

作者的工作(我暂时还没有理解到位,先记下来):

For this analysis, we develop a tool called Repertoire that compares the content and edit operations of program patches to identify ported edits. Repertoire takes diff-based program patches at the release granularity as input. It then uses CCFinderX [15] to identify similar edit content in the patches and determines similar edit operation sequences using N-gram matching [1]. To evaluate the accuracy of Repertoire, we manually construct the ground truth of ported edits on a sampled data set. We inspect code changes whose commit messages indicate cross-system porting activities and individual ported edits reported by Repertoire.

实验结果中作者开发的工具找到的一个结果:

软工学者Baishakhi Ray及其顶会论文解读

一个实验结果

三、论文特色

1)疑问:

our tool Repertoire is the first automated tool for detecting ported edits with high accuracy of 94% precision and 84% recall.

这里面的precision 和 recall到底是什么,看到过很多回了,一直没搞明白。

2)释义:Software forking,forked products,Cross-system porting
Fork:

In software engineering, a project fork happens when developers take a copy of source code from one software package and start independent development on it, creating a distinct and separate piece of software. The term often implies not merely a development branch, but also a split in the developer community, a form of schism. [1]

Cross-system porting:跨程序的移植。

Porting:

In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally designed for (e.g. different CPU, operating system, or third party library). The term is also used when software/hardware is changed to make them usable in different environments.
Software is portable when the cost of porting it to a new platform is significantly less than the cost of writing it from scratch. The lower the cost of porting software, relative to its implementation cost, the more portable it is said to be. [2]

3)重要: 程序移植相关

As copying code fragments across products is common, there are names referring to this process: forking|copying an existing product to create a slightly different product and porting|copying an existing feature implementation or bug fix to another member of the same product family. Software forking is often considered an ad hoc, low-cost alternative to principled product line development [26].

有人在做,估计大有可为。

四、生词收集

spatial
英 [ˈspeɪʃl] 美 [ˈspeʃəl]
adj.
空间的;存在于空间的;受空间条件限制的;占大篇幅的
(formal or technical 术语) 空间的
relating to space and the position, size, shape, etc. of things in it

schism
英 [ˈskɪzəm] 美 [ˈsɪzəm, ˈskɪz-]
n.
教会分立,分裂
strong disagreement within an organization, especially a religious one, that makes its members divide into separate groups

upkeep
英 [ˈʌpki:p] 美 [ˈʌpˌkip]
n. 维持;保养,维修;保养费,维修费
the cost or process of keeping sth in good condition

repertoire
英 [ˈrepətwɑ:(r)] 美 [ˈrepərtwɑ:(r)]
n.
全部节目;全部本领;(计算机的)指令表
The repertoire of a person or thing is all the things of a particular kind that the person or thing is capable of doing.
A per-former’s repertoire is all the plays or pieces of music that he or she has learned and can perform.
You can refer to all the plays or music of a particular kind as, for example, the classical repertoire or the jazz repertoire .

五、好句摘录

Section 3.3 describes how we measure the accuracy of Repertoire through a manual inspection of change logs and program patches and how we tune the input threshold for CCFinderX for our study
注意两个for。

参考文献

[1] Fork (software development). https://en.wikipedia.org/wiki/Fork_(software_development)
[2] Porting. https://en.wikipedia.org/wiki/Porting




文末诗词


那堪独坐青灯,想故国,高台明月。辇下风光,山中岁月,海上心情。
                         ——刘辰翁《柳梢青·春感》