Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-27T11:00:00.210Z Has data issue: false hasContentIssue false

Long-term object search using incremental scene graph updating

Published online by Cambridge University Press:  22 August 2022

Fangbo Zhou
Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
Huaping Liu*
Affiliation:
Department of Computer Science and Technology, Tsinghua University, Beijing, China.
Huailin Zhao
Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
Lanjun Liang
Affiliation:
School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
*
*Corresponding author. E-mail: [email protected].

Abstract

Effective searching for target objects in indoor scenes is essential for household robots to perform daily tasks. With the establishment of a precise map, the robot can navigate to a fixed static target. However, it is difficult for mobile robots to find movable objects like cups. To address this problem, we establish an object search framework that combines navigation map, semantic map, and scene graph. The robot updates the scene graph to achieve a long-term target search. Considering the different start positions of the robots, we weigh the distance the robot walks and the probability of finding objects to achieve global path planning. The robot can continuously update the scene graph in a dynamic environment to memorize the position relation of objects in the scene. This method has been realized in both simulation and real-world environments. The experimental results show the feasibility and effectiveness of this method.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work was completed while Fangbo Zhou was visiting Tsinghua University, Beijing, China.

References

Hess, W., Kohler, D., Rapp, H. and Andor, D., “Real-time Loop Closure in 2d Lidar Slam,” In: 2016 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2016) pp. 12711278.CrossRefGoogle Scholar
Kenye, L. and Kala, R., “Improving RGB-D slam in dynamic environments using semantic aided segmentation,” Robotica 40(6), 20652090 (2021).CrossRefGoogle Scholar
Handa, A., Whelan, T., McDonald, J. and Davison, A. J., “A Benchmark for Rgb-d Visual Odometry, 3D Reconstruction and Slam,” In: 2014 IEEE International Conference on Robotics and Automation (IEEE, 2014) pp. 15241531.CrossRefGoogle Scholar
Fuentes-Pacheco, J., Ruiz-Ascencio, J.é and Rendón-Mancha, J. M., “Visual simultaneous localization and mapping: A survey,” Artif. Intell. Rev. 43(1), 5581 (2015).CrossRefGoogle Scholar
Emrah Dnmez, A. F. K. and Dirik, M., “A vision-based real-time mobile robot controller design based on Gaussian function for indoor environment,” Arab. J. Sci. Eng. 4, 116 (2017).Google Scholar
Dnmez, E. and Kocamaz, A. F., “Design of mobile robot control infrastructure based on decision trees and adaptive potential area methods,” Iran. J. Sci. Technol. Trans. Electr. Eng. 44(2), 431448 (2019).CrossRefGoogle Scholar
Wei, Y., Zhang, K., Wu, D. and Hu, Z., “Exploring conventional enhancement and separation methods for multi-speech enhancement in indoor environments,” Cognit. Comput. Syst. 3(4), 307322 (2021).CrossRefGoogle Scholar
Masutani, Y., Mikawa, M., Maru, N. and Miyazaki, F., “Visual Servoing for Non-Holonomic Mobile Robots,” In: IEEE/RSJ/GI International Conference on Intelligent Robots & Systems 94 Advanced Robotic Systems & the Real World (2002).Google Scholar
Okumu, F., Dnmez, E. and Kocamaz, A. F., “A cloudware architecture for collaboration of multiple agvs in indoor logistics: Case study in fabric manufacturing enterprises,” Electronics 9(12), 20232047 (2020).CrossRefGoogle Scholar
Pandey, K. K. and Parhi, D. R., “Trajectory planning and the target search by the mobile robot in an environment using a behavior-based neural network approach,” Robotica 38(9), 16271641 (2020).CrossRefGoogle Scholar
Du, H., Yu, X. and Zheng, L., “Learning Object Relation Graph and Tentative Policy for Visual Navigation,” In: European Conference on Computer Vision (Springer, 2020) pp. 1934.CrossRefGoogle Scholar
Druon, R., Yoshiyasu, Y., Kanezaki, A. and Watt, A., “Visual object search by learning spatial context,” IEEE Robot. Automat. Lett. 5(2), 12791286 (2020).CrossRefGoogle Scholar
Qiu, Y., Pal, A. and Christensen, H. I., “Learning Hierarchical Relationships for Object-Goal Navigation,” In: 2020 Conference on Robot Learning (CoRL) (2020).Google Scholar
Yang, W., Wang, X., Farhadi, A., Gupta, A. and Mottaghi, R., Visual semantic navigation using scene priors. arXiv preprint arXiv: 1810. 06543, 2018.Google Scholar
Wortsman, M., Ehsani, K., Rastegari, M., Farhadi, A. and Mottaghi, R., “Learning to Learn How to Learn: Self-adaptive Visual Navigation Using Meta-Learning,” In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) pp. 67506759.Google Scholar
DeSouza, G. N. and Kak, A. C., “Vision for mobile robot navigation: A survey,” IEEE Trans. Patt. Anal. 24(2), 237267 (2002).CrossRefGoogle Scholar
Hart, P. E., Nilsson, N. J. and Raphael, B., “A formal basis for the heuristic determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern. 4(2), 100107 (1968).CrossRefGoogle Scholar
Karaman, S. and Frazzoli, E., “Sampling-based algorithms for optimal motion planning,” Int. J. Robot. Res. 30(7), 846894 (2011).CrossRefGoogle Scholar
Kattepur, A. and Purushotaman, B., “Roboplanner: A pragmatic task planning framework for autonomous robots,” Cognit. Comput. Syst. 2(1), 1222 (2020).CrossRefGoogle Scholar
Krichmar, J. L., Hwu, T., Zou, X. and Hylton, T., “Advantage of prediction and mental imagery for goal-directed behaviour in agents and robots,” Cognit. Comput. Syst. 1(1), 1219 (2019).CrossRefGoogle Scholar
Liang, Y., Chen, B. and Song, S., Sscnav: Confidence-aware semantic scene completion for visual semantic navigation, arXiv preprint arXiv:2012.04512 (2020).CrossRefGoogle Scholar
Chaplot, D. S., Gandhi, D. P., Gupta, A. and Salakhutdinov, R. R., “Object goal navigation using goal-oriented semantic exploration,” Adv. Neur. Inform. Process. Syst. 33, 42474258 (2020).Google Scholar
Tan, S., Di, G., Liu, H., Zhang, X. and Sun, F., “Embodied scene description,” Auton. Robot. 46(1), 2143 (2022).CrossRefGoogle Scholar
Savva, M., Kadian, A., Maksymets, O., Zhao, Y., Wijmans, E., Jain, B., Straub, J., Liu, J., Koltun, V., Malik, J., Parikh, D. and Batra, D., “Habitat: A Platform for Embodied Ai Research,” In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2019) pp. 93399347.Google Scholar
Chang, A., Dai, A., Funkhouser, T., Halber, M., Niessner, M., Savva, M., Song, S., Zeng, A. and Zhang, Y., Matterport3d: learning from RGB-D data in indoor environments. arXiv preprint arXiv: 1709. 06158 (2017).CrossRefGoogle Scholar
Cartillier, V., Ren, Z., Jain, N., Lee, S., Essa, I. and Batra, D., Semantic mapnet: building allocentric semanticmaps and representations from egocentric views, arXiv preprint arXiv:2010.01191 (2020).CrossRefGoogle Scholar
Liu, X., Di, G., Liu, H. and Sun, F., “Multi-agent embodied visual semantic navigation with scene prior knowledge,” IEEE Robot. Automat. Lett. 7(2), 31543161 (2022).CrossRefGoogle Scholar
Xinzhu, L., Xinghang, L., Di, G., Huaping, L. and Fuchun, S., Embodied multi-agent task planning from ambiguous instruction (2022).Google Scholar
Li, X., Liu, H., Zhou, J. and Sun, F. C., “Learning cross-modal visual-tactile representation using ensembled generative adversarial networks,” Cognit. Comput. Syst. 1(2), 4044 (2019).CrossRefGoogle Scholar
Tan, S., Xiang, W., Liu, H., Di, G. and Sun, F., “Multi-agent Embodied Question Answering in Interactive Environments,” In: European Conference on Computer Vision (Springer, 2020) pp. 663678.CrossRefGoogle Scholar
Zhu, Y., Mottaghi, R., Kolve, E., Lim, J. J., Gupta, A., Fei-Fei, L. and Farhadi, A., “Target-Driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning,” In: 2017 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2017) pp. 33573364.CrossRefGoogle Scholar
Mousavian, A., Toshev, A., Fišer, M., Košecká, J., Wahid, A. and Davidson, J., “Visual Representations for Semantic Target Driven Navigation,” In: 2019 International Conference on Robotics and Automation (ICRA) (IEEE, 2019) pp. 88468852.CrossRefGoogle Scholar
Redmon, J. and Farhadi, A., Yolov3: An incremental improvement. arXiv preprint arXiv: 1804. 02767 (2018).Google Scholar
Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D., Bernstein, M. and Fei-Fei, L., “Image Retrieval Using Scene Graphs,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015) pp. 36683678.Google Scholar
Lenser, S. and Veloso, M., “Visual Sonar: Fast Obstacle Avoidance Using Monocular Vision,” In: Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), vol. 1, (IEEE,2003) pp. 886891.Google Scholar
Li, X., Di, G., Liu, H. and Sun, F., “Embodied Semantic Scene Graph Generation,” In: Conference on Robot Learning (PMLR, 2022) pp. 15851594.Google Scholar
Zhang, H., Kyaw, Z., Chang, S.-F. and Chua, T.-S., “Visual Translation Embedding Network for Visual Relation Detection,” In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017) pp. 55325540.Google Scholar
Wu, Q., Shen, C., Wang, P., Dick, A. and Van Den Hengel, A., “Image captioning and visual question answering based on attributes and external knowledge,” IEEE Trans. Patt. Anal. 40(6), 13671381 (2017).CrossRefGoogle ScholarPubMed
Zeng, Z., Röfer, A. and Jenkins, O. C., “Semantic Linking Maps for Active Visual Object Search,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 19841990.CrossRefGoogle Scholar
Meyer-Delius, D., Hess, J. M., Grisetti, G. and Burgard, W., “Temporary Maps for Robust Localization in Semi-Static Environments,” In: 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (2010).CrossRefGoogle Scholar
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D. A., Bernstein, M. S. and Fei-Fei, L., “Visual genome: Connecting language and vision using crowdsourced dense image annotations,” Int. J. Comput. Vis. 123(1), 3273 (2017).CrossRefGoogle Scholar
He, K., Gkioxari, G., Dollár, P. and Girshick, R., “Mask R-CNN,” In: Proceedings of the IEEE International Conference on Computer Vision (2017) pp. 29612969.Google Scholar
Kolve, E., Mottaghi, R., Han, W., VanderBilt, E., Weihs, L., Herrasti, A., Gordon, D., Zhu, Y., Gupta, A., Farhadi, A., Ai2-thor: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 (2017).Google Scholar
Gan, C., Zhang, Y., Wu, J., Gong, B. and Tenenbaum, J. B., “Look, Listen, and Act: Towards Audio-Visual Embodied Navigation,” In: 2020 IEEE International Conference on Robotics and Automation (ICRA) (IEEE, 2020) pp. 97019707.CrossRefGoogle Scholar