这篇论文发表在2018年的WWW上。引入知识来进行新闻推荐。
关键词：News recommendation; knowledge graph representation; deep neural networks; attention model

Motivation

过去的新闻推荐方法没有引入知识，很难发现潜在的知识层面的关系
另一方面新闻推荐有高度的时间敏感性并且要随着用户的兴趣而改变。

新闻推荐面临的三个挑战:

highly time-sensitive and their relevance expires quickly within a short period,协同过滤算法效果不好
How to dynamically measure auser’s interest based on his diversified reading history for current candidate news
news language is usually highly condensed and comprised of a large amount of knowledge entities and common sense. 传统方法没有引入知识而只是基于词语共现和聚类。

To extract deep logical connections among news, it is necessary to introduce additional knowledge graph information into news recommendations. 引入外部知识到新闻推荐中。

Methodology

我们把用户 $i$ 的点击历史表示为 $[t_{1}^{i}, t_{2}^{i}, t_{2}^{i}, . . ., t_{N_{i}}^{i}]$ ， $t_{j}^{i}$ 就是第 $i$ 个用户的第 $j$ 个点击的新闻标题。每个标题都是由一堆字组成。每个字又和知识图谱中的某个实体相关。
输入：新闻和用户的点击历史
输出：点击这条新闻的概率

首先将新闻中的每个词和知识图谱中的相关实体联系起来，并搜索和使用每个实体的上下文实体
然后使用knowledge-aware convolutional neural networks(KCNN)来整合word-level and knowledge-level representations of news and generate a knowledge-aware embedding vector
KCNN和之前的工作有两点不同

multi-channel
word-entity-aligned

To get a dynamic representation of a user with respect to current candidate news, we use an attention module to automatically match candidate news to each piece of clicked news, and aggregate the user’s history with different weights.

下面先介绍几个相关的知识点

Knowledge Graph Embedding

典型的知识图谱就是一系列（实体1，关系，实体2）的三元组的集合。

The goal of knowledge graph embedding is to learn a low-dimensional representation vector for each entity and relation that preserves the structural information of the original knowledge graph.

其中，translatioin-based 的方法比较好，它有以下几种具体的方法：TransE、TransH、TransD等。

CNN for Sentence Representation Learning

词袋模型有很多问题。
引入Kim CNN进行句子表示。
DKN: Deep Knowledge-Aware Network for News Recommendation阅读笔记
表示方法如上图所示。使用不同通道的卷积核以及不同大小的卷积核进行卷积操作得到最终的句子表示向量。

可以引入RNN句子表示和CNN句子表示相结合。

DEEP KNOWLEDGE-AWARE NETWORK（DNK）

从这部分开始介绍本文提出的方法。
DNK的整体框架如下所示：

DKN: Deep Knowledge-Aware Network for News Recommendation阅读笔记
KCN将新闻提取为向量表示。之后Attention Net得到历史点击的表示，将其和待预测新闻的向量结合输入到深度神经网络中得到最终的预测结果。

Knowledge Distillation

DKN: Deep Knowledge-Aware Network for News Recommendation阅读笔记
这一部分说明如何将知识融入到特征中。
第一步通过实体链接（entity linking）技术将新闻中的实体识别出来。
第二步从原始KG中提取子图。只有从新闻中提取的实体构建出的子图可能太过稀疏。所以吧子图扩展到和已有实体一跳的实体。
第三步就是know graph embedding。
第四步，学到的entity embedding作为KCNN和DKN的输入

光有新闻中entity的embedding的信息缺少知识图谱的结构信息。为了提供实体的位置信息，加入了“context entity”的信息 $\bar{e}$ ，就是与该实体一跳距离的实体embedding的平均值。

\bar{e} = \frac{1}{| c o n t e x t (e) |} \sum_{e_{i} \in c o n t e x t (e)} e_{i}

DKN: Deep Knowledge-Aware Network for News Recommendation阅读笔记

Knowledge-aware CNN

如前所述，每个word embedding $w_{i}$ 都会有一个entity embedding $e_{i} \in R^{k \times 1}$ 以及相应的context embedding $\bar{e_{i}} \in R^{k \times 1}$ 。其中， $k$ 是entity embedding 的维度。
对于上面这些特征，一个直观的想法就是把他们组合起来作为一个“pseudo words”：

W = [w_{1} w_{2} . . . w_{n} e_{t_{1}} e_{t_{2}} . . .]

但是存在以下问题：

The concatenating strategy breaks up the connection between words and associated
entities and is unaware of their alignment.

Word embeddings and entity embeddings are learned by different methods, meaning it is not suitable to convolute them together in a single vector space.

The concatenating strategy implicitly forces word embeddings and entity embeddings to have the same dimension, which may not be optimal in practical.

所以作者采用了图三左下中的方法，把它们作为类似图像中的不同通道。但entity embedding的维度和word embedding的维度不一样怎么办？又采用了一个转换层将其转换为同样的维度。下面的计算方法就和CV中的卷积神经网络一样了。

Attention-based User Interest Extraction

这里的Attention Net不是指Attention Is All You Need中的attention机制。而是使用了一个DNN网络来计算每个历史点击的权重。但是仔细想想的话其实和attention机制本质上是一样的。但是有个问题是这个方法没有考虑时间，直观上考虑肯定是时间越近的历史对当前影响越大。 最后点击概率的预测又是使用另外一个DNN。

DKN: Deep Knowledge-Aware Network for News Recommendation阅读笔记