Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-27T10:49:40.071Z Has data issue: false hasContentIssue false

A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

Published online by Cambridge University Press:  08 February 2023

C.Y. Bao
Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
X. Zhou
Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
P. Wang*
Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
R.Z. He
Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
G.J. Tang
Affiliation:
College of Aerospace Science and Engineering, National University of Defense Technology, Changsha, 410073, China
*
*Corresponding author. Email: [email protected]

Abstract

An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.

Type
Research Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bao, C., Wang, P. and Tang, G. Integrated guidance and control for hypersonic morphing missile based on variable span auxiliary control, Int J Aerosp Eng, 2019, p 6413410. https://doi.org/10.1155/2019/6413410 Google Scholar
Bao, C., Wang, P. and Tang, G. Integrated method of guidance, control and morphing for hypersonic morphing vehicle in glide phase, Chin J Aeronaut, 2021, 34, (5), pp 535553. https://doi.org/10.1016/j.cja.2020.11.009 CrossRefGoogle Scholar
Zhang, W.J. and Wang, B.M. Predictor corrector algorithms considering multiple constraints for entry vehicles, Aeronaut J, 2022, pp 123. https://doi.org/10.1017/aer.2022.19 Google Scholar
He, R., Liu, L., Tang, G. and Bao, W.M. Rapid generation of entry trajectory with multiple nofly zone constraints, Adv Space Res, 2017, 60, (7), pp 14301442. https://doi.org/10.1016/j.asr.2017.06.046 CrossRefGoogle Scholar
Wei, C., Han, Y., Pu, J., Li, Y. and Huang, P. Rapid multilayer method on solving optimal endoatmospheric trajectory of launch vehicles, Aeronaut J, 2019, 123, (1267), pp 13961414. https://doi.org/10.1017/aer.2019.17 CrossRefGoogle Scholar
Dancila, R.I. and Botez, R.M. New flight trajectory optimisation method using genetic algorithms, Aeronaut J, 2021, 125, (1286), pp 618671. https://doi.org/10.1017/aer.2020.138 CrossRefGoogle Scholar
Chai, R., Tsourdos, A., Savvaris, A., Chai, S. and Xia, Y. Highfidelity trajectory optimization for aeroassisted vehicles using variable order pseudospectral method, Chin J Aeronaut, 2021, 34, (1), pp 237251. https://doi.org/10.1016/j.cja.2020.07.032 CrossRefGoogle Scholar
Rizvi, S.T.I., Linshu, H., Dajun, X. and Shah, S.I.A. Trajectory optimisation for a rocketassisted hypersonic boostglide vehicle, Aeronaut J, 2017, 121, (1238), pp 469487. https://doi.org/10.1017/aer.2017.11 CrossRefGoogle Scholar
Kwon, D., Jung, Y., Cheon, Y.J. and Bang, H. Sequential convex programming approach for realtime guidance during the powered descent phase of mars landing missions, Adv Space Res, 2021, 68, (11), pp 43984417. https://doi.org/10.1016/j.asr.2021.08.033 CrossRefGoogle Scholar
Sagliano, M., Mooij, E. and Theil, S. Onboard trajectory generation for entry vehicles via adaptive multivariate pseudospectral interpolation, AIAA Guidance, Navigation, and Control Conference, San Diego, California, USA, 2016, https://doi.org/10.2514/6.20162115 CrossRefGoogle Scholar
Sagliano, M., Heidecker, A., Macés Hernández, J., Farì, S., Schlotterer, M., Woicke, S., Seelbinder, D. and Dumont, E. Onboard guidance for reusable rockets: aerodynamic descent and powered landing, AIAA Scitech 2021 Forum, 2021, VIRTUAL EVENT. https://doi.org/10.2514/6.20210862 CrossRefGoogle Scholar
Shirobokov, M., Trofimov, S. and Ovchinnikov, M. Survey of machine learning techniques in spacecraft control design, Acta Astronaut, 2021 186, pp 8797. https://doi.org/10.1016/j.actaastro.2021.05.018 Google Scholar
Schmidhuber, J. Deep learning in neural networks: An overview, Neural Netw, 2015, 61, pp 85117. https://doi.org/10.1016/j.neunet.2014.09.003 CrossRefGoogle ScholarPubMed
Basturk, O. and Cetek, C. Prediction of aircraft estimated time of arrival using machine learning methods, Aeronaut J, 2021, 125, (1289), pp 12451259. https://doi.org/10.1017/aer.2021.13 CrossRefGoogle Scholar
Nie, W., Li, H. and Zhang, R. Modelfree adaptive optimal design for trajectory tracking control of rocketpowered vehicle, Chin J Aeronaut, 2020, 33, (6), pp 17031716. https://doi.org/10.1016/j.cja.2020.02.022 CrossRefGoogle Scholar
Shi, Y. and Wang, Z. Onboard generation of optimal trajectories for hypersonic vehicles using deep learning, J. Spacecr Rockets, 2021, 58, (2), pp 400414. https://doi.org/10.2514/1.A34670 CrossRefGoogle Scholar
Sánchez, C. and Izzo, D. Realtime optimal control via deep neural networks: study on landing problems, J Guid Control Dyn, 2018, 41, (5), pp 11221135. https://doi.org/10.2514/1.G002357 CrossRefGoogle Scholar
Cheng, L., Wang, Z., Jiang, F. and Li, J. Fast generation of optimal asteroid landing trajectories using deep neural network, IEEE Trans Aerosp Electron Syst, 2020, 56, (4), pp 26422655. https://doi.org/10.1109/TAES.2019.2952700 CrossRefGoogle Scholar
Tenenbaum, J.B., Kemp, C., Griffiths, T.L. and Goodman, N.D. How to grow a mind: statistics, structure, and abstraction, Science, 2011, 331, (6022), pp 12791285. https://doi.org/10.1126/science.1192788 CrossRefGoogle Scholar
Tsitsiklis, J.N. Asynchronous stochastic approximation and Q-learning, Mach Learn, 1994, 16, (3), pp 185202. https://doi.org/10.1007/BF00993306 CrossRefGoogle Scholar
Han, X., Zheng, Z., Liu, L., Wang, B., Cheng, Z., Fan, H. and Wang, Y. Online policy iteration ADP-based attitude tracking control for hypersonic vehicles, Aerosp Sci Technol, 2020, 106, p 106233. https://doi.org/10.1016/j.ast.2020.106233 CrossRefGoogle Scholar
Shi, Z., Zhao, F., Wang, X. and Jin, Z. Satellite attitude tracking control of moving targets combining deep reinforcement learning and predefinedtime stability considering energy optimization, Adv Space Res, 2022, 69, (5), pp 21822196. https://doi.org/10.1016/j.asr.2021.12.014 CrossRefGoogle Scholar
Gaudet, B., Linares, R. and Furfaro, R. Adaptive guidance and integrated navigation with reinforcement meta-learning, Acta Astronaut, 2020, 169, pp 180190. https://doi.org/10.1016/j.actaastro.2020.01.007 CrossRefGoogle Scholar
Gaudet, B., Linares, R. and Furfaro, R. Deep reinforcement learning for six degree-of-freedom planetary landing, Adv Space Res, 2020, 65, (7), pp 17231741. https://doi.org/10.1016/j.asr.2019.12.030 CrossRefGoogle Scholar
Gaudet, B., Linares, R. and Furfaro, R. Terminal adaptive guidance via reinforcement metalearning: Applications to autonomous asteroid closeproximity operations, Acta Astronaut, 2020, 171, pp 113. https://doi.org/10.1016/j.actaastro.2020.02.036 CrossRefGoogle Scholar
Zavoli, A. and Federici, L. Reinforcement learning for robust trajectory design of interplanetary missions, J Guid Control Dyn, 2021, 44, (8), pp 14401453. https://doi.org/10.2514/1.G005794 CrossRefGoogle Scholar
Zhao, Y., Yang, H. and Li, S. Real-time trajectory optimization for collision-free asteroid landing based on deep neural networks, Adv Space Res, 2022, 70, (1), pp 112124. https://doi.org/10.1016/j.asr.2022.04.006 CrossRefGoogle Scholar
LaFarge, N.B., Miller, D., Howell, K.C. and Linares, R. Autonomous closed-loop guidance using reinforcement learning in a low-thrust, multibody dynamical environment, Acta Astronaut, 2021, 186, pp 123. https://doi.org/10.1016/j.actaastro.2021.05.014 CrossRefGoogle Scholar
Xu, D. and Chen, G. Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning, Aeronaut J, 2022, 126, (1300), pp 932951. https://doi.org/10.1017/aer.2021.112 CrossRefGoogle Scholar
Zhou, Z.G., Zhou, D., Chen, X. and Shi, X.N. Adaptive actor-critic learning-based robust appointed-time attitude tracking control for uncertain rigid spacecrafts with performance and input constraints, Adv Space Res, 2022, p S0273117722003386, https://doi.org/10.1016/j.asr.2022.04.061 Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D. and Rusu, A.A. Human-level control through deep reinforcement learning, Nature, 2015, 518, (7540), pp 529533. https://doi.org/10.1038/nature14236 CrossRefGoogle ScholarPubMed
Silver, D., Lever, G. and Heess, N. Deterministic policy gradient algorithms, Proceedings of the 31st International Conference on International Conference on Machine Learning, 21–26, June 2014, 32, pp 387–395, Bejing, China.Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. and Hassabis, D. Mastering the game of Go with deep neural networks and tree search, Nat, 2016, 529, (7587), pp 484489. https://doi.org/10.1038/nature16961 CrossRefGoogle ScholarPubMed
Zhou, X., Zhang, H.B., Xie, L., Tang, G.J. and Bao, W.M. An improved solution method via the pole-transformation process for the maximum-cross range problem, Proc ImechE G: J Aerosp Eng, 2020, 234, (9), pp 14911506. https://doi.org/10.1177/0954410020914809 CrossRefGoogle Scholar
Phillips, T.H. A common aero vehicle model, description, and employment guide, www.dtic.Mil/matris/sbir041/srch/af031a.doc, 2013.Google Scholar
Zhou, X., He, R.Z., Zhang, H.B., Tang, G.J. and Bao, W.M. Sequential convex programming method using adaptive mesh refinement for entry trajectory planning problem, Aerosp Sci Technol, 2021, 109, p 106374. https://doi.org/10.1016/j.ast.2020.106374 CrossRefGoogle Scholar