2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

1.背景信息

1.输入RGBD图像
2.两阶段位姿估计方法: 先找到model和scene中的点的对应关系,然后用LS或者PnP进行位姿解算。
作者说end to end的方法,直接回归与旋转有关的量,需要神经网络考虑旋转空间的非线性,这可能带来性能提升瓶颈。具体的旋转空间的非线性由PVNet这篇文章进行了解释,因此有必要一看。
3.与YOLOff、PVNet相似,PVN3D也采用
预测offset
而不是直接预测point的思想。这被证明是有利于优化的,因为offset会被限制在一个球内。

2.方法

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
思路和方法都是朴素的:
1. 特征提取阶段:先逐点提取RGB和depth的特征,然后融合。
2. 3D关键点检测和实例语义分割阶段:
2.1 3D关键点检测:网络Mk会逐点预测该seed点到选定的关键点的offset,其中(关键点选取采用了farthest point sampling (FPS))
2.2 实例语义分割:网络Ms会逐点预测该seed点所属的类别,网络Mc会逐点预测该seed点到所在目标中心点的offset。中心offsets和语义分割结果会进行投票和聚类,最终生成可靠的实例分割结果。
我个人理解:假如场景中存在两个相同的物体(比如两个手电钻),通过语义分割网络Ms,这两个手电钻所包含的点会被标记为同一个类别。通过中心点偏移预测网络Mc,计算的偏移量们会使这两个手电钻分开,因为他们会明显不同的预测到两个中心点,然后通过聚类得到这两个手电钻中心点准确的位置
2.3 将3D关键点检测和实例语义分割的结果通过投票和聚类,就可以得到预测的3D关键点在scene object中的位置
2.4 通过将预测的3D关键点与model中最初选定的关键点进行LS求解,即可解出位姿。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

3. Loss

L k e y p o i n t s L_{keypoints} Lkeypoints:
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
L s e m a n t i c L_{semantic} Lsemantic:
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
用的是Focal Loss.

L c e n t e r L_{center} Lcenter:
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
L m u l t i − t a s k L_{multi-task} Lmultitask:
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

4.实验结果

评价标准仍然是ADD和ADD-S

4.1 LineMOD数据集

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
比densenfusion提升很明显。

4.2 YCB数据集

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
比densefusion有很大提升。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
对于大号clamp和超大号clamp的效果。正如作者所说,通过Mk偏移量的大小可以为预测大号clamp和超大号clamp提供信息,这看来是有效的。也可能是因为作者采用的语义分割网络性能比densefuion所用的好一些

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
说明3D KP两阶段的方法比直接回归(RT)的方法具有优势。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
关键点选择用FPS算法的依据。

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
对遮挡的性能测试,PVN3D效果很好。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
对语义分割效果的测试,PVN3D效果非常好。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
三个模块的消融实验。
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
实例分割效果。

5. 读后感

1. 这篇文章提出的方法是PVNet的3D扩展,说明了几点有效的思路:
(1)两阶段(即先预测关键点对应关系,再预测位姿),效果要优于end to end的方法的。
(2)预测offset比直接预测point更容易学习且效果更好。
(3)解决多个目标物体同时预测问题,实例语义分割会发挥巨大作用。

2. 同时也有一些问题:
(1)训练Mk的时候,如何从数据集中计算得到ground truth的offset?训练Mc的时候,又如何计算得到ground truth的offset?
(2)在Mk中预测得到的offsets怎么影响语义分割的效果的?
这两个问题,需要读完代码再来解释。

6. 这篇文章的报告ppt

2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation
2020.10.13重读 PVN3D:A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation