姿态识别大纲

自上而下

  • Cascaded Pyramid Network
  • Stacked Hourglass Networks
  • Alphapose
  • Simple Baseline
  • HRNet

自下而上

  • openpose

simple baseline

作者看法
          姿态识别的信息特征提取的关键在于如何将小的feature map变大。obtaining high resolution feature maps is crucial, but no matter how
综述
         our method combines the upsampling and convolutional parameters into deconvolutional layers in a much simpler way, without using skip layer connections.

姿态识别大纲

    Our method simply adds a few deconvolutional layers over the last convolution stage in the ResNet, called C5.

Loss

Mean Squared Error (MSE) is used as the loss between
the predicted heatmaps and targeted heatmaps

知识点

转制卷积和upsample的区别

  • transpose conv
            装置卷积是将卷积操作反过来,卷积是多对以,transpose conv是一对多
  • upsample
            最近邻插值(Nearest neighbor interpolation)
            双线性插值(Bi-Linear interpolation)

heatmap 高斯分布
          The targeted heatmap HkH^k for joint k is generated by applying a 2D gaussian centered on the k th joint’s groundtruth location. 大小64*64



Hrnet

hign resolution network
解释:
          Maintain high-resolution representations through the whole process

anthor view:
          Repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations

姿态识别大纲

          we perform repeated multiscale fusions to boost the high-resolution representations with the help of the low-resolution representations of the same depth and similar level

small net and one big net
where 32 and 48 represent the widths © of the high-resolution subnetworks in last three stages,

Multi-scale fusion

          Need a separate low-to-high upsampling process and aggregate low-level and high-level representations.

Loss

          We regress the heatmaps simply from the high-resolution representations output by the last exchange unit,



Openpose

综述
       We present an approach to efficiently detect the 2D pose of multiple people in an image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance, irrespective of the number of people in the image.
       We present the first bottom-up representation of association scores via Part Affinity Fields (PAFs), a set of 2D vector fields that encode the location and orientation of limbs over the image domain.

Method

  • input
    RGB image

  • output

    • a set of 2D confidence maps S of body part locations
    • a set of 2D vector fields L of part affinities(L:每个点的方向向量 在x,y方向的分量。)

姿态识别大纲

  • finally
    the confidence maps and the affinity fields are parsed by greedy inferenceto output the 2D keypoints for all people in the image.
    姿态识别大纲

知识点

Part Affinity Fields (PAFs)
功能:preserves both location and orientation information across the region of support of the limb

本质:The part affinity is a 2D vector field for each limb
        2D vector encodes the direction that points from one part of the limb to the other.
        Each type of limb has a corresponding affinity field joining its two associated body parts.