本节内容综述

1-of-N Encoding具有局限性，无法体现词义之间的联系；做word class也无法体现全面的信息。因此需要 word embedding。
word embedding就是用向量表示词。但这是无监督学习（输入一个词，输出一个向量）。但不可用auto-encoder。
word embedding的基本思路就是：通过上下文找到这个词的意义。
基本思路有 Count based 与 Perdiction based 。见[小细节](#How to exploit the context?)
在 Perdiction based 中有许多变形，如CBOW，Skip-gram等。见[小细节](#Prediction-based Various Architectures)
word embedding带来了许多有趣的特性，比如观察词的属性、进行加减运算等等。
此外，还可以对图像进行embedding，已达到类似“元学习”的效果。

小细节

How to exploit the context?

【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding
而对于Prediction-based：

给一个句子中的单词；
来预测下一个单词是谁。

【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding
这样，拿出第一层的输出，就是这个embedding向量。

这样对于有相同后缀的词，神经网络就必须让这两个词的向量接近。

此外，还有些Sharing Parameters等技巧。
【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding
如上图，输入两个词，来进行预测。注意，每个词不管在哪个地方被输入，起自己对应的权重必须一样，这样从神经网络隐层中取出embedding vector对这个词才是唯一的。上图中，一个颜色代表同一权重。

那么，如何保证权重相同呢？

【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding
如上，减去相同的项，以保证更新的同步。

Prediction-based Various Architectures

【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding

【李宏毅2020 ML/DL】P22 Unsupervised Learning - Word Embedding

本节内容综述

小细节

How to exploit the context?

Prediction-based Various Architectures

相关推荐