Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem

Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem

When correspondences are known, the problem reduces to the standard PnP problem [10,17,33,19]. When correspondences are unknown, the problem is blind PnP, for which several traditional geometry-based methods were proposed, in- cluding SoftPoSIT [7], BlindPnP [24], GOPAC [2] and GOSMA [3].


上述方法都是需要pose先验或者需要穷举搜索,本文直接回归we propose to estimate the correspondence matrix directly.




  1. a new deep method to solve the blind PnP problem with unknown 2D–3D correspondences. To the best of our knowledge, there is no existing deep method that takes unordered 2D and 3D point-sets (with unknown corre- spondences) as inputs, and outputs a 6-DoF camera pose;

  2. a two-stream network to extract discriminative features from the point sets, which encodes both local geometric structure and global context; and

  3. an original global feature matching network based on a recurrent Sinkhorn layer to find 2D–3D correspondences, with a loss function that maximizes the matching probabilities of inlier matches.


When 3D points are not utilized, the PoseNet algorithms [15,14] can directly regress a camera pose. However, the accuracy of the regressed 6-DoF poses is inferior to geometry-based methods that use 3D points.(当没有3d点的时候使用pose net这样的网络是最好的,但是因为没有3d点的东西,因此精度都是一个大问题)


Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem





Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem

然后使用l2距离针对每个2d点和每个3d点计算他们的邻域,然后手动构造一个graph,然后使用Similar to EdgeConv对这些graph进行卷积运算。最后得到每个3d和2d点的几何feature ,本文设计的是128维。再然后将整个集合输入context norm 融合全局几何信息(2d和3d点集在经过上一步都变成了3d点集合)


3.3 Global Feature Matching


定义H矩阵:Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem,f表示上述学习到的feature




Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem


W权重并不是仅仅通过h矩阵+sinkhorn得到,同时还考虑了一个先验r和s(概率)Prior Matchability: For each point we define a prior unary matchability measuring how likely it is to have a match.

Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem


Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem


W的每一行的和等于s,每一列和等于r,,因为s r表示的是这个2d点或者3d点是不是一个正确的match的概率

