Estimating the camera pose requires point correspondences. However, in practice, correspondences are inevitably corrupted by outliers, which affects the pose estimation. We propose a general and accurate outlier removal strategy for robust camera pose estimation. The proposed strategy can detect outliers by leveraging the fact that only inliers comply with two effective consensuses, i.e., 3D ray bundle consensus and 2D vector field consensus. Our strategy has a nested structure. First, the outer module utilizes the 3D ray bundle consensus. We define the likelihood based on the probabilistic mixture model and maximize it by the expectation-maximization (EM) algorithm. The inlier probability of each correspondence and the camera pose are determined alternately. Second, the inner module exploits the 2D vector field consensus to refine the probabilities obtained by the outer module. The refinement based on the Bayesian rule facilitates the convergence of the outer module and improves the accuracy of the entire framework. Our strategy can be integrated into various existing camera pose estimation methods which are originally vulnerable to outliers. Experiments on both synthesized data and real images have shown that our approach outperforms state-of-the-art outlier rejection methods in terms of accuracy and robustness.

The proposed outlier removal strategy for camera pose estimation has a nested structure composed of the *outer* and *inner* modules.

(1) The outer module is based on the 3D ray bundle consensus (RBC) which is shown in Fig. 1(a). A bundle of rays passing through the inliers of 3D-to-2D point correspondences intersect at a common point, i.e., the optical center of the camera, while the rays formed by outliers have arbitrary directions;

(2) The inner module is based on the 2D vector field consensus (VFC). We define a virtual camera and project 3D points to its image, so that original 3D-to-2D correspondences are mapped as 2D-to-2D correspondences. Then a set of 2D vectors is formed by these 2D-to-2D correspondences, and the result is shown in Fig. 1(b). The vector inliers share a regular orientation trend, while the outliers are disordered.

The details of algorithm are described in the paper.

The(1) 3D RBC is utilized by the outer module. We define the likelihood based on the probabilistic mixture model and maximize it by the expectation-maximization (EM) algorithm. The inlier probability of each correspondence and the camera pose are determined alternately;

(2) 2D VFC is exploited by the inner module to refine the probabilities obtained by the outer module. The refinement based on the Bayesian rule facilitates the convergence of the outer module and improves the accuracy of the entire framework;

(3) The proposed outlier removal strategy is general. It can be easily integrated into various existing pose estimation methods which are originally vulnerable to outliers.

To evaluate the proposed outlier removal strategy for
camera pose estimation, we have conducted experiments on
both *synthesized data* and *real images*.

We compare our
method with existing state-of-the-art approaches in terms of
*accuracy* and *efficiency*.

We denote our strategy based on the ray bundle consensus
and the vector field consensus by **RBC-VFC**.
Besides, the outer module of our strategy only leveraging the
ray bundle consensus is denoted as **RBC** and
tested independently. We integrate our **RBC** and** RBC-VFC** strategies into two popular pose estimation methods: classical
DLT [15] and widely-used EPnP [4], respectively. The
integration forms two algorithm sets as follows:

(1) DLT, EPnP with RBC: S1={**DLT+**; **EPnP+**};

(2) DLT, EPnP with RBC-VFC: S2={**DLT++**; **EPnP++**}.

We compare our methods from S1 and S2 with state-of-the art ones. We use the following three methods that are relatively efficient and do not require any pose prior:

(1) Fast outlier removal strategy for EPnP [7], which is
denoted as **FOR-EPnP**;

(2) Two-point localization method based on the toroidal
constraint [6], which is denoted as **2P-TC**;

(3) Classical RANSAC [11] integrated into EPnP [4], which
is denoted as **RSC-EPnP**. This integration can be regarded
as the representative of RANSAC-alike methods.

All the above methods are implemented by MATLAB and tested on an Intel Core i7 CPU with 2.40 GHz. In the following, we present comparisons on accuracy and efficiency.

3.1.1 Evaluation on Accuracy

We design two groups of experiments with respect to the outlier ratio and the number of 3D-to-2D correspondences. Specifically, for the first group, we set the number of inliers as 50, and change the outlier ratio from 10% to 70%; for the second group, we fix the outlier ratio to 50%, and adjust the total number of correspondences from 10 to 500. We follow the criteria "rotation error" and "translation error" defined in OPnP [17] to quantitatively evaluate the accuracy of the estimated pose.

a) Test on the outlier ratio: The first row of Fig. 2 shows the accuracy for an increasing outlier ratio;

b) Test on the number of correspondences: The second row of Fig. 2 reports the accuracy for an increasing number of correspondences.

3.1.2 Evaluation on Efficiency

The number of correspondences increases from 100 to 1000 with an outlier ratio of 50%. Fig. 3 presents the computational time of different approaches.

We conduct two types of experiments for different purposes: (i) the tests on the EPFL dataset [21] aim at assessing methods using the images with large angular disparities; (ii) the tests on the TUM dataset [22] focus on evaluating approaches using long sequences whose adjacent frames are similar. Specifically, we compare the performances of original EPnP [4] and its robust versions including FOR-EPnP [7], RSC-EPnP [11] and our EPnP++.

*a) Tests on the EPFL dataset:* We evaluate various
methods on the Castle-P19 of the EPFL dataset [21]. This
image set is composed of 19 images of 3072*2048 pixels,
and the ground-truth poses are given. Repetitive patterns
and large angular disparities of these images are prone
to lead to mis-matches. We randomly select an image as
the query image (shown in Fig. 4), and estimate its pose
by various methods. Besides quantitative criteria "rotation error" *E*rot and "translation error" *E*trans [17] for accuracy evaluation, we also provide
an evaluation of visual alignment (readers are invited to refer to the paper for details).

[1] R. Mur-Artal, J. M. M. Montiel, and J. D. Tards, “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Transactions on Robotics, 2015.

[4] V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” International Journal of Computer Vision, 2008.

[6] F. Camposeco, T. Sattler, A. Cohen, A. Geiger, and M. Pollefeys, “Toroidal constraints for two point localization under high outlier ratios,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017.

[7] L. Ferraz, X. Binefa, and F. Moreno-Noguer, “Very fast solution to the PnP problem with algebraic outlier rejection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2014.

[11] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, 1981.

[15] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, second edition, 2003.

[17] Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom, and M. Okutomi, “Revisiting the PnP problem: A fast, general and optimal solution,” in IEEE International Conference on Computer Vision, 2013.

[21] C. Strecha, W. von Hansen, L. Van Gool, P. Fua, and U. Thoennessen, “On benchmarking camera calibration and multi-view stereo for high resolution imagery,” in IEEE Conference on Computer Vision and Pattern Recognition, 2008.

[22] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, “A benchmark for the evaluation of RGB-D SLAM systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.

1. EPFL Dataset: http://icwww.epfl.ch/~marquez/multiview/denseMVS.html

2. TUM Dataset: http://vision.in.tum.de/data/datasets/rgbd-dataset

3. Source Codes of Proposed Methods: Source Code

1. Haoang Li, Ji Zhao, Jean-Charles Bazin, Lei Luo, and Jian Yao, *“Robust Camera Pose Estimation via Consensus on Ray Bundle
and Vector Field”*, submitted to The 2018 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2018) , March 2018.