姿态识别大纲
自上而下
- Cascaded Pyramid Network
- Stacked Hourglass Networks
- Alphapose
- Simple Baseline
- HRNet
自下而上
- openpose
simple baseline
作者看法:
姿态识别的信息特征提取的关键在于如何将小的feature map变大。obtaining high resolution feature maps is crucial, but no matter how
综述:
our method combines the upsampling and convolutional parameters into deconvolutional layers in a much simpler way, without using skip layer connections.
Our method simply adds a few deconvolutional layers over the last convolution stage in the ResNet, called C5.
Loss
Mean Squared Error (MSE) is used as the loss between
the predicted heatmaps and targeted heatmaps
知识点
转制卷积和upsample的区别
- transpose conv
装置卷积是将卷积操作反过来,卷积是多对以,transpose conv是一对多 - upsample
最近邻插值(Nearest neighbor interpolation)
双线性插值(Bi-Linear interpolation)
heatmap 高斯分布
The targeted heatmap for joint k is generated by applying a 2D gaussian centered on the k th joint’s groundtruth location. 大小64*64
Hrnet
hign resolution network
解释:
Maintain high-resolution representations through the whole process
anthor view:
Repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations
we perform repeated multiscale fusions to boost the high-resolution representations with the help of the low-resolution representations of the same depth and similar level
small net and one big net
where 32 and 48 represent the widths © of the high-resolution subnetworks in last three stages,
Multi-scale fusion
Need a separate low-to-high upsampling process and aggregate low-level and high-level representations.
Loss
We regress the heatmaps simply from the high-resolution representations output by the last exchange unit,
Openpose
综述
We present an approach to efficiently detect the 2D pose of multiple people in an image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image.
We present the first bottom-up representation of association scores via Part Affinity Fields (PAFs), a set of 2D vector fields that encode the location and orientation of limbs over the image domain.
Method
-
input
RGB image -
output
- a set of 2D confidence maps S of body part locations
- a set of 2D vector fields L of part affinities(L:每个点的方向向量 在x,y方向的分量。)
- finally
the confidence maps and the affinity fields are parsed by greedy inferenceto output the 2D keypoints for all people in the image.
知识点
Part Affinity Fields (PAFs)
功能:preserves both location and orientation information across the region of support of the limb
本质:The part affinity is a 2D vector field for each limb
2D vector encodes the direction that points from one part of the limb to the other.
Each type of limb has a corresponding affinity field joining its two associated body parts.