Hostname: page-component-f554764f5-fnl2l Total loading time: 0 Render date: 2025-04-22T23:56:17.589Z Has data issue: false hasContentIssue false

RGB-D visual odometry by constructing and matching features at superpixel level

Published online by Cambridge University Press:  18 September 2024

Meiyi Yang
Affiliation:
Department of Automation, University of Science and technology of China, Hefei, China Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hongkong
Junlin Xiong*
Affiliation:
Department of Automation, University of Science and technology of China, Hefei, China
Youfu Li
Affiliation:
Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hongkong
*
Corresponding author: Junlin Xiong; Email: [email protected]

Abstract

Visual odometry (VO) is a key technology for estimating camera motion from captured images. In this paper, we propose a novel RGB-D visual odometry by constructing and matching features at the superpixel level that represents better adaptability in different environments than state-of-the-art solutions. Superpixels are content-sensitive and perform well in information aggregation. They could thus characterize the complexity of the environment. Firstly, we designed the superpixel-based feature SegPatch and its corresponding 3D representation MapPatch. By using the neighboring information, SegPatch robustly represents its distinctiveness in various environments with different texture densities. Due to the inclusion of depth measurement, the MapPatch constructs the scene structurally. Then, the distance between SegPatches is defined to characterize the regional similarity. We use the graph search method in scale space for searching and matching. As a result, the accuracy and efficiency of matching process are improved. Additionally, we minimize the reprojection error between the matched SegPatches and estimate camera poses through all these correspondences. Our proposed VO is evaluated on the TUM dataset both quantitatively and qualitatively, showing good balance to adapt to the environment under different realistic conditions.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Company-Corcoles, J. P., Garcia-Fidalgo, E. and Ortiz, A., “MSC-VO: Exploiting Manhattan and structural constraints for visual odometry,” IEEE Robot Autom Lett 7(2), 28032810 (2022).CrossRefGoogle Scholar
Mur-Artal, R., Montiel, J. M. and Tardos, J. D., “ORB-SLAM: A versatile and accurate monocular SLAM system,” IEEE Trans Robot 31(5), 11471163 (2015).CrossRefGoogle Scholar
Liu, H., Li, C., Chen, G., Zhang, G., Kaess, M. and Bao, H., “Robust keyframe-based dense SLAM with an RGB-D camera” (2017). arXiv preprint arXiv: 1711.05166, 2017.Google Scholar
Wang, Y.-T. and Lin, G.-Y., “Improvement of speeded-up robust features for robot visual simultaneous localization and mapping,” Robotica 32(4), 533549 (2014).CrossRefGoogle Scholar
Kerl, C., Sturm, J. and Cremers, D., “Robust Odometry Estimation for RGB-D Cameras,” In: IEEE International Conference on Robotics and Automation 2013, (IEEE, 2013) pp. 37483754.CrossRefGoogle Scholar
Engel, J., Koltun, V. and Cremers, D., “Direct sparse odometry,” IEEE Trans Pattern Anal 40(3), 611625 (2017).CrossRefGoogle ScholarPubMed
Jia, Q., Pu, Y., Chen, J., Cheng, J., Liao, C. and Yang, X., “D2VO: Monocular Deep Direct Visual Odometry,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (IEEE, 2020) pp. 1015810165.CrossRefGoogle Scholar
Yang, N., v. Stumberg, L., Wang, R. and Cremers, D., “D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2020) pp. 12781289.CrossRefGoogle Scholar
Taubner, F., Tschopp, F., Novkovic, T., Siegwart, R. and Furrer, F., “LCD-Line Clustering and Description for Place Recognition,” In: 2020 International Conference on 3D Vision (3DV), (IEEE, 2020) pp. 908917.CrossRefGoogle Scholar
Pumarola, A., Vakhitov, A., Agudo, A., Sanfeliu, A. and Moreno-Noguer, F., “PL-SLAM: Real-time Monocular Visual SLAM with Points and Lines,” In: Proceedings - IEEE International Conference on Robotics and Automation, (IEEE, 2017) pp. 45034508.CrossRefGoogle Scholar
Zhao, L., Huang, S., Yan, L. and Dissanayake, G., “A new feature parametrization for monocular SLAM using line features,” Robotica 33(3), 513536 (2015).CrossRefGoogle Scholar
Yang, S., Song, Y., Kaess, M. and Scherer, S., “Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2016, (IEEE, 2016) pp. 12221229.CrossRefGoogle Scholar
Yang, S. and Scherer, S., “Monocular object and plane SLAM in structured environments,” IEEE Robot Autom Lett 4(4), 31453152 (2019).CrossRefGoogle Scholar
Ren, X. and Malik, J., “Learning a Classification Model for Segmentation,” In: Proceedings of the IEEE International Conference on Computer Vision 2003, (IEEE, 2003) pp.1017.CrossRefGoogle Scholar
Malkov, Y. A. and Yashunin, D. A., “Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs,” IEEE Trans Pattern Anal 42(4), 824836 (2020).CrossRefGoogle ScholarPubMed
Sturm, J., Engelhard, N., Endres, F., Burgard, W. and Cremers, D., “A Benchmark for the Evaluation of RGB-D SLAM Systems,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2012, (IEEE, 2012) pp. 573580.CrossRefGoogle Scholar
Yang, N., Wang, R. and Cremers, D., “Feature-based or direct: An evaluation of monocular visual odometry,” 112 (2017), arXiv preprint arXiv: 1705.04300, 2017.Google Scholar
Endres, F., Hess, J., Sturm, J., Cremers, D. and Burgard, W., “3-D mapping with an RGB-D camera,” IEEE Trans Robot 30(1), 177187 (2014).CrossRefGoogle Scholar
Mur-Artal, R. and Tardós, J. D., “ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras,” IEEE Trans Robot 33(5), 12551262 (2017).CrossRefGoogle Scholar
Kerl, C., Sturm, J. and Cremers, D., “Dense Visual SLAM for RGB-D Cameras,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems 2013, (IEEE, 2013) pp. 21002106.CrossRefGoogle Scholar
Pretto, A., Menegatti, E., Bennewitz, M., Burgard, W. and Pagello, E., “A Visual Odometry Framework Robust to Motion Blur,” In: IEEE International Conference on Robotics and Automation 2009, ( IEEE, 2009) pp. 22502257.CrossRefGoogle Scholar
Weingarten, J. and Siegwart, R., “EKF-based 3D SLAM for Structured Environment Reconstruction,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems 2005, (IEEE, 2005) pp. 38343839.CrossRefGoogle Scholar
Lee, T.-K., Lim, S., Lee, S., An, S. and Oh, S.-Y., “Indoor Mapping Using Planes Extracted from Noisy RGB-D Sensors,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems 2012, (IEEE, 2012) pp. 17271733.CrossRefGoogle Scholar
Kaess, M., “Simultaneous Localization and Mapping with Infinite Planes,” In: IEEE International Conference on Robotics and Automation (ICRA) 2015, (IEEE, 2015) pp. 46054611.CrossRefGoogle Scholar
Ma, L., Kerl, C., Stückler, J. and Cremers, D., “ CPA-SLAM: Consistent Plane-Model Alignment for Direct RGB-D SLAM,” In: IEEE International Conference on Robotics and Automation (ICRA) 2016, (IEEE, 2016) pp. 12851291.CrossRefGoogle Scholar
Hsiao, M., Westman, E., Zhang, G. and Kaess, M., “Keyframe-based Dense Planar SLAM,” In: IEEE International Conference on Robotics and Automation (ICRA) 2017, (IEEE, 2017) pp. 51105117.CrossRefGoogle Scholar
Taguchi, Y., Jian, Y.-D., Ramalingam, S. and Feng, C., “Point-Plane SLAM for Hand-Held 3D Sensors,” In: IEEE International Conference on Robotics and Automation 2013, (IEEE, 2013) pp. 51825189.CrossRefGoogle Scholar
Zhang, X., Wang, W., Qi, X., Liao, Z. and Wei, R., “Point-plane SLAM using supposed planes for indoor environments,” Sensors 19(17), 3795 (2019).CrossRefGoogle ScholarPubMed
Li, Y., Yunus, R., Brasch, N., Navab, N. and Tombari, F., “RGB-D SLAM with Structural Regularities,” In: IEEE International Conference on Robotics and Automation (ICRA) 2021, (IEEE, 2021) pp. 1158111587.CrossRefGoogle Scholar
Pfister, H., Zwicker, M., van Baar, J. and Gross, M., “Surfels: Surface Elements as Rendering Primitives,” In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, (ACM Press/Addison-Wesley Publishing Co, 2000) pp. 335342.CrossRefGoogle Scholar
Concha, A. and Civera, J., “DPPTAM: Dense Piecewise Planar Tracking and Mapping from A Monocular Sequence,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2015, (IEEE, 2015) pp. 56865693.CrossRefGoogle Scholar
Yan, Z., Ye, M. and Ren, L., “Dense visual SLAM with probabilistic surfel map,” IEEE Trans Vis Comput Gr 23(11), 23892398 (2017).CrossRefGoogle ScholarPubMed
Muñoz-Salinas, R. and Medina-Carnicer, R., “UcoSLAM: Simultaneous localization and mapping by fusion of Keypoints and squared planar markers,” Pattern Recognit 101, 107193 (2020).CrossRefGoogle Scholar
Cho, H. M., Jo, H. and Kim, E., “SP-SLAM: Surfel-point simultaneous localization and mapping,” IEEE/ASME Trans Mechatron 27(5), 25682579 (2021).CrossRefGoogle Scholar
Yunus, R., Li, Y. and Tombari, F., “ManhattanSLAM: Robust Planar Tracking and Mapping Leveraging Mixture of Manhattan Frames,” In: IEEE International Conference on Robotics and Automation (ICRA) 2021, (IEEE, 2021) pp. 66876693.CrossRefGoogle Scholar
Wang, S., Clark, R., Wen, H. and Trigoni, N., “DeepVO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks,” In: IEEE international conference on robotics and automation (ICRA) 2017, (IEEE, 2017) pp. 20432050.CrossRefGoogle Scholar
Li, R., Wang, S., Long, Z. and Gu, D., “UnDeepVO: Monocular Visual Odometry Through Unsupervised Deep Learning,” In: IEEE International Conference on Robotics and Automation (ICRA) 2018, (IEEE, 2018) pp. 72867291.CrossRefGoogle Scholar
Yang, N., Wang, R., Stuckler, J. and Cremers, D., “Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry,” In: Proceedings of the European Conference on Computer Vision (ECCV), (Springer, 2018) pp. 835852.CrossRefGoogle Scholar
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. and Süsstrunk, S., “SLIC superpixels compared to state-of-the-art superpixel methods,” IEEE Trans Pattern Anal 34(11), 22742281 (2012).CrossRefGoogle ScholarPubMed
Giraud, R., Ta, V.-T., Bugeau, A., Coupe, P. and Papadakis, N., “SuperPatchMatch: An algorithm for robust correspondences using superpixel patches,” IEEE Trans Image Process 26(8), 40684078 (2017).CrossRefGoogle ScholarPubMed
Barnes, C., Shechtman, E., Finkelstein, A. and Goldman, D. B., “PatchMatch: A randomized correspondence algorithm for structural image editing,” ACM Trans Graphic 28(3), 24 (2009).CrossRefGoogle Scholar
Feng, C., Taguchi, Y. and Kamat, V. R., “Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering,” In: IEEE International Conference on Robotics and Automation (ICRA) 2014, (IEEE, 2014) pp. 62186225.CrossRefGoogle Scholar
Besl, P. J. and McKay, N. D., “Method for Registration of 3-d Shapes,” In: Sensor Fusion IV: Control Paradigms and Data Structures 1992. vol. 1611 (Spie, 1992) pp. 586606.CrossRefGoogle Scholar
Whelan, T., Salas-Moreno, R. F., Glocker, B., Davison, A. J. and Leutenegger, S., “ElasticFusion: Real-time dense SLAM and light source estimation,” Int J Robot Res 35(14), 16971716 (2016).CrossRefGoogle Scholar
Li, S., Wu, X., Cao, Y. and Zha, H., “Generalizing to the Open World: Deep Visual Odometry With Online Adaptation,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021, (IEEE, 2021) pp. 1318413193.CrossRefGoogle Scholar
Li, S., Xue, F., Wang, X., Yan, Z. and Zha, H., “Sequential Adversarial Learning for Self-Supervised Deep Visual Odometry,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2019, (IEEE, 2019) pp. 28512860.CrossRefGoogle Scholar