The structural features in Manhattan world encode useful geometric information of parallelism, orthogonality and/or coplanarity in the scene. By fully exploiting these structural features, we propose a monocular SLAM system which can obtain accurate estimation of camera poses and 3D map. The foremost contribution of the proposed system is a structural features based optimization module which contains three novel optimization strategies. First, a rotation optimization strategy using the parallelism and orthogonality of 3D lines is presented. Based on these two geometric cues, we propose a global binding method and an approach for calculating relative rotation to get accurate absolute rotations. Second, a translation optimization strategy leveraging coplanarity is proposed. Coplanar features are effectively identified, and they are exploited by a unified model handling points and lines equivalently to calculate relative translation, followed by obtaining optimal absolute translations. Third, a 3D line optimization strategy utilizing parallelism, orthogonality and coplanarity simultaneously is proposed to obtain an accurate 3D map consisting of structural line segments with low computational complexity. Experiments in man-made environments have demonstrated that the proposed system outperforms existing state-of-the-art monocular SLAM systems in terms of accuracy and robustness.
We first exploit nonstructural features to obtain rough estimation of camera poses and 3D map following existing methods, and then use the structural features to develop an optimization module containing three novel optimization strategies as main contributions:
We first proposed a unified model exploiting point and line features simultaneously, which has three main advantages:
(1) Accurate rotation optimization strategy leveraging the parallelism and orthogonality: A global binding method and an approach for calculating precise relative rotation are proposed to significantly reduce accumulating error of absolute rotations;
(2) Accurate translation optimization strategy exploiting coplanarity: coplanar features are identified effectively, and then used by a unified model handling coplanar points and lines equivalently to obtain the relative translations, followed by the absolute translations optimization;
(3) Accurate and efficient 3D map optimization strategy based on parallelism, orthogonality and coplanarity: a novel 3D line parameterization method is designed, along with a reliable cost function based on re-projection error minimization of lines.
The basic geometric model used in this paper is shown as Fig.1. The details of algorithm are described in the paper.
To demonstrate the performance of the proposed structural features based SLAM system, we conduct experiments on both simulated data and real image sequence. We compare our methods with existing state-of-the-art approaches in terms of accuracy and efficiency.
b) Relative translation estimation
We compare our method UM-RT (cf. Section IV-B) based on the unified model handling coplanar points and lines, with non-structural points based method [20] noted as NP-RT, and structural lines based approach [23] denoted as SL-RT (shown in Fig. 2(b)).
c) 3D line optimization
We compare our 3D line optimization strategy S-LO based on structural constraint (cf.Section V) with traditional non-structural constraint based approach [6] denoted as NS-LO. Both two methods aim to minimize the re-projection error, and we solve them by the Levenberg-Marquardt method available on the Ceres Solver [25] (shown in Tab. 1).
d) Test on long image sequence
To evaluate the proposed system Struct-PL-SLAM (cf. Section II-A) based on structural points and lines in the large-scale scene, an experiment is conducted on the long synthetic image sequence. We compare our Struct-PL-SLAM with non-structural lines based system [4] denoted as Line-SLAM. Fig. 3 shows a comparison of recovered absolute pose. Fig. 4 shows a comparison of the reconstructed 3D line segments.
Fig. 6 shows the trajectories of cameras estimated by various systems. Overall, above non-structural features based systems have unsatisfactory performance. As to structural features based systems, Struct- Line-SLAM does not perform as we have expected. On the contrary, proposed Struct-PL-SLAM has high accuracy and robustness.
Next, we evaluate the accuracy of 3D maps reconstructed by various systems. Fig. 7 shows the comparison between 3D map consisting of line segments of PL-SLAM and 3D structural map of Struct-PL-SLAM. The map of PL-SLAM is more disordered, due to limited accuracy of rotation and translation, as well as the noise in image line matches. In contrast, the result of our Struct-PL-SLAM is more accurate.
1. HRBB4 Dataset : http://telerobot.cs.tamu.edu/MFG/data/hrbb4/index.html
2. Algorithm Codes of Proposed Methods: Source Code
1. Haoang Li, Jian Yao*, Jean-Charles Bazin, Xiaohu Lu and Yazhou Xing, Submitted to The 2018 IEEE International Conference on Robotics and Automation (ICRA 2018) , September 2017.