我已经有两年 ML 经历，这系列课主要用来查缺补漏，会记录一些细节的、自己不知道的东西。

本节内容综述

本节课四十分钟，由助教 Chi-Liang Liu 讲解 Self-Supervised Learning 。
首先复习，监督学习；监督学习效果好，但是所需的标签数据是“稀缺资源”；而无标签数据是很多的。
Self-Supervised Learning 其实是一种 Un-Supervised Learning，但是其更注重数据本身的信息。
目前 Self-Supervised Learning 可以分为 3 种：Reconstruct from a corrupted (or partial) data、Visual common sense tasks与Contrastive Learning。
分别讲解上述三个思想及其子方法。
在 Bert-family(Text) 中提及了不少技术、花费了课程大半的时间。
后介绍了些图片编码技术。

小细节

Methods of Self-Supervised Learning

Reconstruct from a corrupted (or partial) data

Denoising Autoencoder
Bert-family(Text)
In-painting(Image)

Visual common sense tasks

Jigsaw puzzles
Rotation

Contrastive Learning

word2vec
Contrastive Predictive Coding(CPC)
SIMCLR

Reconstruct from a corrupted (or partial) data

Denoising Autoencoder

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，在 Denoising Autoencoder 中，我们不仅仅重视编码器，还重视解码器，或者说，我们重视模型整体，进行训练。

Bert-family(Text)

Language Model

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，基础的语言模型是用于估计语句的出现概率。

ELMO & GPT & BERT

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，ELMO是在 pre-training 时，进行前后向的训练。

如上，GPT使用了 12 层的 Transformer 。如何使用呢？在使用时，只需要将 Task Prediction 拔掉，接一个 Task Classifier 上去。

如上，而 BERT 的特点是使用了 Masked LM 。

BERT - Pipeline

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，BERT也分为 Pre-training 与 Fine-Tuning 两个步骤。

ARLM vs AELM

【李宏毅2020 ML/DL】P66 Self-supervised Learning

如上，BERT可以归为 Autoencoding Language Model (AELM) ；而 GPT 可以归为 Autoregressive Language Model (ARLM) 。

ARLM的好处是，通常不好有数据的冲突，但是只能是单向的（只能前向或者后向）。

XLNet - Permutation LM

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，为了解决ARLM的局限性，提出 XLNet 。

XLNet 可以分为 2 步：

先打乱顺序；
后再依次输入。

如上例子，打乱顺序为 3,2,4,1 的话，我们编码 3 只能从 memory 中获取信息；打乱顺序为 2,4,3,1 的话，我们编码 3 就从 memory 以及 2, 4 获取信息。因此类推，让神经网络可以“看得到两边”。

BART - Encoder & Decoder

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，做一个 Encoder 与一个 Decoder 。其中 Encoder 与 BERT 差不多，而 Decoder 能做更多的事，使用了 Auto regressive。

ELECTRA - Discriminator

【李宏毅2020 ML/DL】P66 Self-supervised Learning
用类似 GAN 的架构，最后使用这个训练好的 Discriminator 作为编码工具。

In-painting(Image)

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，对于图片的处理，将其挖掉一部分，进行训练。

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，还有一种训练方法，如预测图片的色彩，目标是让上色后的图片与真实图片相同。

Visual common sense tasks

Jigsaw puzzles

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，还可以做“拼图”的任务。

Rotation

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，还可以让图片旋转，并且预测其旋转了多少度。

Contrastive Learning

Contrastive Predictive Coding(CPC)

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，较新的一篇文章。给定一个序列，预测是不是接下来的部分。这样可以有很多负样本，进行大量负采样。

这实际上就是 Word2Vec 。

SIMCLR

【李宏毅2020 ML/DL】P66 Self-supervised Learning

如上，现在对于一个数据 $x$ ，我们进行随机的变换，得到 $\tilde{x}_i$ 以及 $\tilde{x}_j$ ，我们希望经过编码的 $\tilde{x}_i$ 与 $\tilde{x}_j$ 越像越好；而与其他数据得到的编码越不像越好。

【李宏毅2020 ML/DL】P66 Self-supervised Learning
如上，这个随机变换可以是旋转、去色、傅里叶转换等等。

Reference

CS294-158 Deep Unsupervised Learning Lecture 7
AAAI 2020 Keynotes Truing Award Winners Event
Learning From Text - OpenAI
Learning from Unlabeled Data - Thang Luong