Momen^et

前言

想开始从****写笔记,阅读是一方面,但是阅读只是不能停留在纸上,只有讲出来才能更深入理解。当然也希望跟大家多多讨论这篇文章是在《Deep Learning for 3D Point Clouds: A Survey》文章中 Point-based Methods段落看到的。初入点云,所以想顺着survey文章梳理一下。ieee-18-5, 的意思是收录-发表年份-引用量(引用信息来自SemanticScholar)。

关键字:3D object classification, PointNet, hand-crafted input

文章内容

网络结构

文章网络结构如图
Momen^et
整体架构还是利用了PointNet,输入不利用xyz,而是增加了一些物理量(动量等)。
我也认为讲物理知识与3D点云联系起来是个不错的方法,希望以后可以有些idea。
其他文章也讨论过hand-crafted feature,explicit feature以及implicit feature,我想这也是不同网络的本质区别之一。

引用原文的话解释这个网络:
(1) we add polynomial functions as part of the input point cloud,
(2) we reduce the number of MLP layers to only one layer with 512 features, and
(3) we concatenate average pooling to the existing max pooling operation, which is also a symmetric operation.

其他描述(引用)

Part of the pointNet architecture is a transformer network based on the spatial transformer networks (STN).It is supposed to map the input point cloud to a canonical form. However, the part handling spatial context, the STN in pointNet, is sensitive to different orientations of the pointcloud.

PointNet不是旋转不变的,也有多个文章提到了,其中使用的STN模块,所生成的旋转矩阵,也不知道本质是何,希望继续学习讨论。

The main contribution of this paper is leveraging the network’s ability to operate on point clouds by adding polynomial functions to their coordinates. Such a design can allow the network to account for higher order moments and therefore achieve higher classification accuracy with lower time and memory consumption. Next, we show that it is indeed essential to add polynomial functions to the input, as learning to multiply inputs items is a challenge for neural networks.

文章再次强调,是增加了多项式输入来提高分辨准确性。只靠网络自身学习一些特征,可能效果较差而且没有很好的可解释性。所以hand-crafted input作为人类对网络的一种“指导”,也是很好的。

问题

  1. 什么样的hand-crafted input好?
  2. 可否有其他物理模型应用到点云中(看过的文章,本质结构都是MLP-based以及VoxelNet-based方法,仍是把数据当成一种1D或者2D的结构来处理,感觉还是3D感知略有出入,不知道是我对数据理解不够,还是人类的3D感知限制了我的视野)
  3. 很多特征聚合都在使用MaxPooling,所以说最大值的feature就是好feature么?