您的位置: 首页 > 文章 > deeplearning.ai - 人脸识别和神经风格转换

deeplearning.ai - 人脸识别和神经风格转换

分类: 文章 • 2025-03-10 16:09:40

卷积神经网络
吴恩达 Andrew Ng

Face recognition & Neural style transfer

Face Recognition

What is face recognition?

Verification 验证

Input image, name/ID
Output whether the input image is that of the claimed person
“is this the claimed person?”

Recognition 识别

Has a database of K persons
Get an input image
Output ID if the image is any of the K persons (or “not recognized”)
“who is this person?”

One Shot Learning

learn from one example to recognize the person again
一个类别只有一个样本供学习
learn a similarity function, 输入两张照片，判断其相似度

Siamese Network

图片输入一个卷积网络，输出一个特征向量，作为该图片的编码
parameters of neural network define an encoding $f (x^{(i)})$
DeepFace: Closing the Gap to Human-Level Performance in Face Verification
FaceNet: A Unified Embedding for Face Recognition and Clustering

FaceNet learns a neural network that encodes a face image into a vector of 128 numbers.

Triplet Loss

anchor (原图)，positive image(同一个人)，negative image(不同的人)
margin 间隔
- 防止学习到的参数为全零
- 使得 A和P、A和N 的差距变大
triplet loss function is defined on triples of images
$L (A, P, N) = m a x (‖ f (A) - f (P) ‖^{2} - ‖ f (A) - f (N) ‖^{2} + α, 0)$

$J = \sum L (A^{(i)}, P^{(i)}, N^{(i)})$
if A, P, N are chosen randomly, $d (A, P) + α \leq d (A, N)$
一般情况下不同人的照片差距就比同一人照片的大

Face Verification and Binary Classification

两个图片一组，输出0/1，表示不相同/相同
预先计算 encoding 的值，下次需要比对时不必再通过卷积网络计算

Neural Style Transfer

What is neural style transfer?

神经风格转换
**C**ontent, **S**tyle, **G**enerated image

What are deep ConvNets learning?

Visualizing deep layers

deeplearning.ai - 人脸识别和神经风格转换

第一层的隐藏单元通常会找一些简单的特征，比如边缘、颜色、阴影等
第二层似乎检测到更复杂的形状和模式
第三层明显检测到更复杂的模式
第四层检测到的模式和特征更加复杂

Cost Function

$J (G) = α J_{C o n t e n t} (C, G) + β J_{S t y l e} (S, G)$
Initiate G randomly
Use gradient descent to minimize J(G)

Content Cost Function

use hidden layer l to compute content cost
let $a^{[l] (C)}$ and $a^{[l] (G)}$ be the actication of layer l on the images
if $a^{[l] (C)}$ and $a^{[l] (G)}$ are similar, both images have similar content
$J_{C o n t e n t} (C, G) = ‖ a^{[l] (C)} - a^{[l] (G)} ‖^{2}$

Style Cost Function

$G_{k k^{^{'}}}^{[l]} = \sum_{i = 1}^{n_{H}^{[l]}} \sum_{j = 1}^{n_{W}^{[l]}} a_{i, j, k}^{[l]} a_{i, j, k^{^{'}}}^{[l]}$
$J_{S t y l e}^{[l]} (S, G) = \sum_{k = 1}^{n_{C}^{[l]}} \sum_{k^{^{'}} = 1}^{n_{C}^{[l]}} (G_{k k^{^{'}}}^{[l] (S)} - G_{k k^{^{'}}}^{[l] (G)})$
$J_{S t y l e} (S, G) = \sum_{l} λ^{[l]} J_{S t y l e}^{[l]} (S, G)$

1D and 3D Generalizations

3D Convolution
deeplearning.ai - 人脸识别和神经风格转换