水了一篇PAKDD2018的文章:Topic-specific Retweet Count Ranking for Weibo

看题目就知道做什么工作:Topic-specific Retweet Count Ranking for Weibo

摘要:

In this paper, we study \emph{topic-specific} retweet count ranking problem in Weibo. Two challenges make this task nontrivial. Firstly, traditional methods cannot derive effective feature for tweets, because in topic-specific setting, tweets usually have too many shared contents to distinguish them. We propose a LSTM-embedded autoencoder to generate tweet features with the insight that any different prefixes of tweet text is a possible distinctive feature. Secondly, it is critical to fully catch the meaning of topic in topic-specific setting, but Weibo can provide little information about topic. We leverage real-time news information from Toutiao to enrich the meaning of topic, as more than 85\% topics are headline news. We evaluate the proposed components based on ablation methods, and compare the overall solution with a recently-proposed tensor factorization model. Extensive experiments on real Weibo data show the effectiveness and flexibility of our methods.

可以看出来,本文主要共享在于提取topic、tweet、user特征的方法。其中user特征天然存在,不需要多做处理;关于topic特征,由于微博本身提供的topic信息比较少,本文从今日头条这样的新闻网站上提取相关topic的信息(因为有研究证明,微博上85%的信息都是news,和今日头条的属性比较贴近),然后用DAE提取topic特征;关于tweet特征,主要问题是,同一个topic下的tweets基本上都是相同的(包括大量原封不动的转发、少数添加了几句个人意见的评论、短文本等等难点),本文采取LSTM-embedded autoencoder,和机器翻译中的autoencoder的区别主要在于本文关注特征提取(encoder的输出)而不是两种语言的映射(decoder的输出):

水了一篇PAKDD2018的文章:Topic-specific Retweet Count Ranking for Weibo

水了一篇PAKDD2018的文章:Topic-specific Retweet Count Ranking for Weibo


而整个文章用到的排序方法,word embedding方法都是现成的,并没有太大共享。

总结这篇文章的共享有三点:第一,做的是topic-specific的ranking工作,这个之前很少有人做;第二,提出了提取tweet、topic的方法,虽然都很直观,但可以使用的场景也比较多;第三,提出的方法效果还不错。



找到一篇PAKDD2017的介绍文章:

http://data-mining.philippe-fournier-viger.com/pakdd-2017-conference-brief-report/

2) The number of accepted long and short papers at PAKDD forthe last six years is presented below.水了一篇PAKDD2018的文章:Topic-specific Retweet Count Ranking for Weibo

5) The acceptance rate of long and short papers at PAKDD during the last six years水了一篇PAKDD2018的文章:Topic-specific Retweet Count Ranking for Weibo