Hostname: page-component-f554764f5-rj9fg Total loading time: 0 Render date: 2025-04-20T19:56:26.372Z Has data issue: false hasContentIssue false

Research on obstacle avoidance of underactuated autonomous underwater vehicle based on offline reinforcement learning

Published online by Cambridge University Press:  29 October 2024

Tao Liu*
Affiliation:
School of Ocean Engineering and Technology, Sun Yat-sen University & Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China Guangdong Provincial Key Laboratory of Information Technology for Deep Water Acoustics, Zhuhai, China
Junhao Huang
Affiliation:
School of Ocean Engineering and Technology, Sun Yat-sen University & Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
Jintao Zhao
Affiliation:
School of Ocean Engineering and Technology, Sun Yat-sen University & Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai, China
*
Corresponding author: Tao Liu; Email: [email protected]

Abstract

The autonomous navigation and obstacle avoidance capabilities of autonomous underwater vehicles (AUVs) are essential for ensuring their safe navigation and long-term, efficient operation. However, the complexity of the marine environment poses significant challenges to safe and effective obstacle avoidance. To address this issue, this study proposes an AUV obstacle avoidance control algorithm based on offline reinforcement learning. This method adopts the Conservative Q-learning (CQL) algorithm, which is based on the Soft Actor-Critic (SAC) framework. It learns from obtained historical obstacle avoidance data and ultimately achieves a favorable obstacle avoidance control strategy. In this method, PID and SAC control algorithms are utilized to generate expert obstacle avoidance data to construct a diversified offline database. Additionally, based on the line-of-sight (LOS) guidance method and artificial potential field (APF) method, information regarding the distance and orientation of targets and obstacles is incorporated into the state space, and heading and obstacle avoidance reward terms are integrated into the reward function design. The algorithm successfully guides the AUV in autonomous navigation and dynamic obstacle avoidance in three-dimensional space. Furthermore, the algorithm exhibits a certain degree of anti-interference capability against uncertain disturbances and ocean currents, enhancing the safety and robustness of the AUV system. Simulation results fully demonstrate the feasibility and effectiveness of the intelligent obstacle avoidance method based on offline reinforcement learning. This study highlights the profound significance of offline reinforcement learning in enabling robust and reliable control systems for AUVs, paving the way for enhanced operational capabilities in challenging marine environments.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Zhou, J., Si, Y. and Chen, Y., “A review of subsea AUV technology,” J Mar Sci Eng 11(6), 1119 (2023).CrossRefGoogle Scholar
Tabatabaee-Nasab, F. S., Moosavian, S. A. A. and Khalaji, A. K., “Adaptive fault-tolerant control for an autonomous underwater vehicle,” Robotica 40(11), 40764089 (2022).CrossRefGoogle Scholar
Taheri, E., Ferdowsi, M. H. and Danesh, M., “Design boundary layer thickness and switching gain in SMC algorithm for AUV motion control,” Robotica 37(10), 17851803 (2019).CrossRefGoogle Scholar
Wang, H. and Su, B., “Event-triggered formation control of AUVs with fixed-time RBF disturbance observer,” Appl Ocean Res 112, 102638 (2021).CrossRefGoogle Scholar
Thomas, C., Simetti, E. and Casalino, G., “A unifying task priority approach for autonomous underwater vehicles integrating homing and docking maneuvers,” J Mar Sci Eng 9(2), 162 (2021).CrossRefGoogle Scholar
Gong, P., Yan, Z., Zhang, W. and Tang, J., “Trajectory tracking control for autonomous underwater vehicles based on dual closed-loop of MPC with uncertain dynamics,” Ocean Eng 265, 112697 (2022).CrossRefGoogle Scholar
Li, D. and Du, L., “AUV trajectory tracking models and control strategies: A review,” J Mar Sci Eng 9(9), 1020 (2021).CrossRefGoogle Scholar
Hammad, M. M., Elshenawy, A. K. and El Singaby, M. I., “Position Control and Stabilization of Fully Actuated AUV Using PID Controller,” In: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, (Springer International Publishing, 2018) pp. 517536.CrossRefGoogle Scholar
Khodayari, M. H. and Balochian, S., “Modeling and control of autonomous underwater vehicle (AUV) in heading and depth attitude via self-adaptive fuzzy PID controller,” J Mar Sci Technol 20(3), 559578 (2015).CrossRefGoogle Scholar
Peng, H., Huang, B., Jin, M., Zhu, C., Zhuang, J., “Distributed finite-time bearing-based formation control for underactuated surface vessels with Levant differentiator,” ISA Transactions, 147, 239–251 (2024).CrossRefGoogle ScholarPubMed
Zhou, B., Huang, B., Su, Y. and Zhu, C., “Interleaved periodic event-triggered communications-based distributed formation control for cooperative unmanned surface vessels,” IEEE Trans Neur Net Learn Syst 113 (2024).Google ScholarPubMed
Huang, B., Song, S., Zhu, C., Li, J. and Zhou, B., “Finite-time distributed formation control for multiple unmanned surface vehicles with input saturation,” Ocean Eng 233, 109158 (2021).CrossRefGoogle Scholar
Zhou, B., Huang, B., Su, Y., Wang, W. and Zhang, E., “Two-layer leader-follower optimal affine formation maneuver control for networked unmanned surface vessels with input saturations,” Int J Robust Nonlin Cont 34(5), 36313655 (2024).CrossRefGoogle Scholar
Wang, C., Cai, W., Lu, J., Ding, X. and Yang, J., “Design, modeling, control, and experiments for multiple AUVs formation,” IEEE Trans Autom Sci Eng 19(4), 27762787 (2021).CrossRefGoogle Scholar
Khlif, N., Nahla, K. and Safya, B., “Reinforcement learning with modified exploration strategy for mobile robot path planning,” Robotica 41(9), 26882702 (2023).CrossRefGoogle Scholar
Zaman, M. Q. and Wu, H.-M., “An improved fuzzy inference strategy using reinforcement learning for trajectory-tracking of a mobile robot under a varying slip ratio,” Robotica 42(4), 11341152 (2024).CrossRefGoogle Scholar
Sancak, C., Yamac, F. and Itik, M., “Position control of a planar cable-driven parallel robot using reinforcement learning,” Robotica 40(10), 33783395 (2022).CrossRefGoogle Scholar
Liu, S., Ma, C. and Juan, R., “AUV obstacle avoidance framework based on event-triggered reinforcement learning,” Electronics 13(11), 2030 (2024).CrossRefGoogle Scholar
Fang, Y., Huang, Z., Pu, J. and Zhang, J., “AUV position tracking and trajectory control based on fast-deployed deep reinforcement learning method,” Ocean Eng 245, 110452 (2022).CrossRefGoogle Scholar
Cheng, C., Sha, Q., He, B. and Li, G., “Path planning and obstacle avoidance for AUV: A review,” Ocean Eng 235, 109355 (2021).CrossRefGoogle Scholar
Sun, Y., Zhang, C., Zhang, G., Xu, H. and Ran, X., “Three-dimensional path tracking control of autonomous underwater vehicle based on deep reinforcement learning,” J Mar Sci Eng 7(12), 443 (2019).CrossRefGoogle Scholar
Wu, H., Song, S., You, K. and Wu, C., “Depth control of model-free AUVs via reinforcement learning,” IEEE Trans Syst Man Cybern: Syst 49(12), 24992510 (2018).CrossRefGoogle Scholar
Cui, R., Yang, C., Li, Y. and Sharma, S., “Adaptive neural network control of AUVs with control input nonlinearities using reinforcement learning,” IEEE Trans Syst Man Cybern Syst 47(6), 10191029 (2017).CrossRefGoogle Scholar
Carlucho, I., De Paula, M., Wang, S., Petillot, Y. and Acosta, G. G., “Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning,” Robot Auton Syst 107, 7186 (2018).CrossRefGoogle Scholar
Jiang, P., Song, S. and Huang, G., “Attention-based meta-reinforcement learning for tracking control of AUV with time-varying dynamics,” IEEE Trans Neur Net Lear Syst 33(11), 63886401 (2021).CrossRefGoogle Scholar
Ma, D., Chen, X., Ma, W., Zheng, H. and Qu, F., “Neural network model-based reinforcement learning control for auv 3-d path following,” IEEE Trans Intell Veh, 9(1), 893–904 (2023).Google Scholar
Yuan, J., Wang, H., Zhang, H., Lin, C., Yu, D. and Li, C., “AUV obstacle avoidance planning based on deep reinforcement learning,” J Mar Sci Eng 9(11), 1166 (2021).CrossRefGoogle Scholar
Hadi, B., Khosravi, A. and Sarhadi, P., “Deep reinforcement learning for adaptive path planning and control of an autonomous underwater vehicle,” Appl Ocean Res 129, 103326 (2022).CrossRefGoogle Scholar
Fujimoto, S., Meger, D. and Precup, D., “Off-Policy Deep Reinforcement Learning Without Exploration,” In: International Conference on Machine Learning, (PMLR, 2019) pp. 20522062.Google Scholar
Agarwal, R., Schuurmans, D. and Norouzi, M., “An Optimistic Perspective on Offline Reinforcement Learning,” In: International Conference on Machine Learning, (PMLR, 2020) pp. 104114.Google Scholar
Levine, S., Kumar, A., Tucker, G. and Fu, J., “Offline reinforcement learning: Tutorial, review, and perspectives on open problems (2020) arxiv preprint arxiv: 2005.Google Scholar
Kumar, A., Zhou, A., Tucker, G. and Levine, S., “Conservative q-learning for offline reinforcement learning,” Adv Neur Inf Process Syst 33, 11791191 (2020).Google Scholar
Yu, T., Thomas, G., Yu, L., Ermon, S., Zou, J. Y., Levine, S. and Ma, T., “Mopo: Model-based offline policy optimization,” Adv Neur Inf Process Syst 33, 1412914142 (2020).Google Scholar
Fossen, T. I., “Marine control systems-guidance. navigation, and control of ships, rigs and underwater vehicles,” (2002). Marine Cybernetics, Trondheim, Norway, Org. Number NO 985 195 005 MVA, www. marinecybernetics. com, ISBN: 82 92356 00 2.Google Scholar
Fossen, T. I.. Handbook of Marine Craft Hydrodynamics and Motion Control, John Wiley & Sons Ltd, Chichester, UK (2011).Google Scholar
Prestero, T. T. J., Verification of a six-degree of freedom simulation model for the REMUS autonomous underwater vehicle (Doctoral dissertation, Massachusetts institute of technology), (2001).CrossRefGoogle Scholar