姿态识别大纲

自上而下

Cascaded Pyramid Network
Stacked Hourglass Networks
Alphapose
Simple Baseline
HRNet

自下而上

openpose

simple baseline

作者看法：
姿态识别的信息特征提取的关键在于如何将小的feature map变大。obtaining high resolution feature maps is crucial, but no matter how
综述：
our method combines the upsampling and convolutional parameters into deconvolutional layers in a much simpler way, without using skip layer connections.

姿态识别大纲

Our method simply adds a few deconvolutional layers over the last convolution stage in the ResNet, called C5.

Loss

Mean Squared Error (MSE) is used as the loss between
the predicted heatmaps and targeted heatmaps

知识点

转制卷积和upsample的区别

transpose conv
装置卷积是将卷积操作反过来，卷积是多对以，transpose conv是一对多
upsample
最近邻插值(Nearest neighbor interpolation)
双线性插值(Bi-Linear interpolation)

heatmap 高斯分布
The targeted heatmap $H^k$ for joint k is generated by applying a 2D gaussian centered on the k th joint’s groundtruth location. 大小64*64

Hrnet

hign resolution network
解释：
Maintain high-resolution representations through the whole process

anthor view:
Repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations

姿态识别大纲

we perform repeated multiscale fusions to boost the high-resolution representations with the help of the low-resolution representations of the same depth and similar level

small net and one big net
where 32 and 48 represent the widths © of the high-resolution subnetworks in last three stages,

Multi-scale fusion

Need a separate low-to-high upsampling process and aggregate low-level and high-level representations.

Loss

We regress the heatmaps simply from the high-resolution representations output by the last exchange unit,

Openpose

综述
We present an approach to efﬁciently detect the 2D pose of multiple people in an image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image.
We present the ﬁrst bottom-up representation of association scores via Part Afﬁnity Fields (PAFs), a set of 2D vector ﬁelds that encode the location and orientation of limbs over the image domain.

Method

input
RGB image
output
- a set of 2D conﬁdence maps S of body part locations
- a set of 2D vector ﬁelds L of part afﬁnities(L：每个点的方向向量在x，y方向的分量。)

姿态识别大纲

finally
the conﬁdence maps and the afﬁnity ﬁelds are parsed by greedy inferenceto output the 2D keypoints for all people in the image.

知识点

Part Afﬁnity Fields (PAFs)
功能：preserves both location and orientation information across the region of support of the limb

本质：The part afﬁnity is a 2D vector ﬁeld for each limb
2D vector encodes the direction that points from one part of the limb to the other.
Each type of limb has a corresponding afﬁnity ﬁeld joining its two associated body parts.

自上而下

自下而上

simple baseline

Loss

知识点

Hrnet

Multi-scale fusion

Loss

Openpose

Method

知识点

相关推荐