Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

D. Xu; G. Chen

doi:10.1017/aer.2021.112

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

Published online by Cambridge University Press: 13 January 2022

D. Xu

and

G. Chen

Show author details

D. Xu: Affiliation:
State Key Laboratory for Strength and Vibration of Mechanical Structures, Xi’an Jiaotong University, Xi’an, 710049, China
G. Chen*: Affiliation:
Shaanxi Province Key Laboratory for Service Environment and Control of Advanced Aircraft, Xi’an Jiaotong University, Xi’an, 710049, China
*: E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.

Keywords

Cluster Artificial intelligence Autonomous learning Cooperative control Reinforcement learning Improved multi-agent deep deterministic policy gradient

Type: Research Article
Information: The Aeronautical Journal , Volume 126 , Issue 1300 , June 2022 , pp. 932 - 951

DOI: https://doi.org/10.1017/aer.2021.112 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of Royal Aeronautical Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Xing, D.J., Zhen, Z.Y. and Gong, H.J. Offense-defense confrontation decision making for dynamic UAV swarm versus UAV swarm, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2019, 233, (15), pp 5689–5702. https://doi.org/10.1177/0954410019853982 CrossRef Google Scholar

Zhang, J. and Xing, J.H. Cooperative task assignment of multi-UAV system, Chin. J. Aeronaut., 2020. https://doi.org/10.1016/j.cja.2020.02.009 CrossRef Google Scholar

Wang, C., Wu, L.Z., Yan, C., et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork, Chin. J. Aeronaut., 2020. https://doi.org/10.1016/j.cja.2020.05.001 CrossRef Google Scholar

Imanberdiyev, N., Fu, C., Kayacan, E., et al. Autonomous navigation of UAV by using real-time model-based reinforcement learning, 14th International Conference on Control, Automation, Robotics and Vision (ICARCV 2016). https://doi.org/10.1109/ICARCV.2016.7838739 CrossRef Google Scholar

Wu, Y.H., Yu, Z.C., Li, C.Y., et al. Reinforcement learning in dual-arm trajectory planning for a free-floating space robot, Aerosp. Sci. Technol., 2020, 98. https://doi.org/10.1016/j.ast.2019.105657 CrossRef Google Scholar

Dong, Y.Q., Ai, J.L. and Liu, J.Q. Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2019, 233, (16), pp 5943–5991. https://doi.org/10.1177/0954410019889447 CrossRef Google Scholar

Sun, Z., Chao, T., Wang, S., et al. Ascent trajectory tracking method using time-varying quadratic adaptive dynamic programming, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2018, 233, (11), pp 4154–4165. https://doi.org/10.1177/0954410018817613 CrossRef Google Scholar

Xu, G.T., Long, T., Wang, Z., et al. Target-bundled genetic algorithm for multi-unmanned aerial vehicle cooperative task assignment considering precedence constraints, Proc. Inst. Mech. Eng. G J. Aerosp. Eng. 2019, 234, (3), pp 760–773. https://doi.org/10.1177/0954410019883106 CrossRef Google Scholar

Zhao, E.J., Chao, T., Wang, S.Y., et al. Multiple flight vehicles cooperative guidance law based on extended state observer and finite time consensus theory, Proc. Inst. Mech. Eng. G J. Aerosp. Eng., 2016, 232, (2), pp 270–279. https://doi.org/10.1177/0954410016683734 CrossRef Google Scholar

Lowe, R., Wu, Y., Tamar, A., et al. Multi-agent Actor-Critic for mixed cooperative-competitive environments. arXiv:1706.02275v3, 2018.Google Scholar

Liu, Y.X., Liu, H., Tian, Y.L., et al. Reinforcement learning based two-level control framework of UAV swarm for cooperative persistent surveillance in an unknown urban area, Aerosp. Sci. Technol., 2020, 98, p 105671. https://doi.org/10.1016/j.ast.2019.105671 CrossRef Google Scholar

Zhen, Z.Y., Xing, D.J. and Gao, C. Cooperative search-attack mission planning for multi-UAV based on intelligent self-organized algorithm, Aerosp. Sci. Technol., 2018, 76, pp 402–411. https://doi.org/10.1016/j.ast.2018.01.035 CrossRef Google Scholar

Yao, P., Wang, H.L. and Su, Z.K. Cooperative path planning with applications to target tracking and obstacle avoidance for multi-UAVs, Aerosp. Sci. Technol., 2016, 54, pp 10–22. https://doi.org/10.1016/j.ast.2016.04.002 CrossRef Google Scholar

Wang, C., Li, J., Jing, N., et al. A distributed cooperative dynamic task planning algorithm for multiple satellites based on multi-agent hybrid learning, Chin. J. Aeronaut. 2011, 24, (4), pp 493–505. https://doi.org/10.1016/S1000-9361(11)60057-5 CrossRef Google Scholar

Sun, G.B., Zhou, R., Xu, K., et al. Cooperative formation control of multiple aerial vehicles based on guidance route in a complex task environment, Chin. J. Aeronaut. 2020, 33, (2), pp 701–720. https://doi.org/10.1016/j.cja.2019.08.009 CrossRef Google Scholar

Fu, X.W., Pan, J., Wang, H.X., et al. A formation maintenance and reconstruction method of UAV swarm based on distributed control, Aerosp. Sci. Technol., 2020, 104, p. 105981. https://doi.org/10.1016/j.ast.2020.105981 CrossRef Google Scholar

Fu, X.W., Pan, J., Wang, H.X., et al. A formation maintenance and reconstruction method of UAV swarm based on distributed control with obstacle avoidance, Australian and New Zealand Control Conference (ANZCC), 2019. https://doi.org/10.1109/ANZCC47194.2019.8945601 CrossRef Google Scholar

La, H.M., Nguyen, T., Le, T.D., et al. Formation control and obstacle avoidance of multiple rectangular agents with limited communication ranges, IEEE Trans. Control Network Syst., 2017, 4, (4), pp 680–691. https://doi.org/10.1109/TCNS.2016.2542978 Google Scholar

La, H.M. and Sheng, W. Dynamic target tracking and observing in a mobile sensor network, Robot. Autonom. Syst., 2012, 60, (7), pp 996–1009. https://doi.org/10.1016/j.robot.2012.03.006 CrossRef Google Scholar

Degas, A., Rantrua, A., Kaddoum, E., et al. Dynamic collision avoidance using local cooperative airplanes decisions, CEAS Aeronaut. J., 2020, 11, pp 309–320. https://doi.org/10.1007/s13272-019-00400-6 CrossRef Google Scholar

Busoniu, L., Babuska, R. and Schutter, B.D. Multi-agent reinforcement learning: An overview, Srinivasan, D., & Jain, L.C. (eds). Innovations in Multi-Agent Systems and Applications – 1, vol. 310. Studies in Computational Intelligence, Springer, Berlin, Heidelberg, 2010, pp 183–221. https://doi.org/10.1007/978-3-642-14435-6_7 CrossRef Google Scholar

Musavi, N., Onural, D., Gunes, K., et al. Unmanned aircraft systems airspace integration: a game theoretical framework for concept evaluations, J. Guid. Control Dyn. 2017, 40, (1), pp 96–109. https://doi.org/10.2514/1.G000426 CrossRef Google Scholar

Petar, K., Sylvain, C. and Darwin, C. Reinforcement learning in robotics: applications and real-world challenges, Robotics, 2013, 2, (3), pp 122–148. https://doi.org/10.3390/robotics2030122 Google Scholar

Das-Stuart, A., Howell, K.C., and Folta, D. Rapid trajectory design in complex environments enabled by reinforcement learning and graph search strategies, Acta Astronaut., 2019, 171, pp 172–195. https://doi.org/10.1016/j.actaastro.2019.04.037 CrossRef Google Scholar

Jiang, J.X., Zeng, X.Y., Guzzetti, D., et al. Path planning for asteroid hopping rovers with pre-trained deep reinforcement learning architectures, Acta Astronaut., 2020, 171, pp 265–279. https://doi.org/10.1016/j.actaastro.2020.03.007 CrossRef Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., et al. Human-level control through deep reinforcement learning, Nature, 518, 2015, pp 529–533. https://doi.org/10.1038/nature14236.CrossRef Google Scholar PubMed

Wang, Z.Y., Freitas, N.D. and Lanctot, M. Dueling network architectures for deep reinforcement learning, Proceedings of the International Conference on Machine Learning, New York, USA, April 2016: 1995-2003. arXiv: 1511.06581v3.Google Scholar

Hausknecht, M. and Stone, P. Deep recurrent q-learning for partially observable MDPs, Association for the Advancement of Artificial Intelligence (AAAI 2015), 2017. arXiv: 1507.06527v4.Google Scholar

Yang, X.X. and Wei, P. UAV navigation in high dynamic environments: A deep reinforcement learning approach, Chin. J. Aeronaut. 2020. https://doi.org/10.1016/j.cja.2020.05.011 CrossRef Google Scholar

Silver, D., Lever, G., Heess, N., et al. Deterministic policy gradient algorithms, Proceedings of the International Conference on Machine Learning, vol. 32, 2014, pp 387–395.Google Scholar

Duryea, E., Ganger, M. and Hu, W. Exploring deep reinforcement learning with multi q-learning, Intell. Cont. Automat., 2016, 7, (4) pp 129–144. https://doi.org/10.4236/ica.2016.74012 CrossRef Google Scholar

Littman, M.L. Markov games as a framework for multi-agent reinforcement learning, Proceedings of the 11th International Conference on Machine Learning (ICML 1994), Rutgers University, New Brunswick, NJ, July 1994, pp 157–163. https://doi.org/10.1016/B978-1-55860-335-6.50027-1 CrossRef Google Scholar

Gong, L.G., Wang, Q., Hu, C.H., et al. Switching control of morphing aircraft based on Q-learning, Chin. J. Aeronaut., 2020, 33, (2), pp 672–687. https://doi.org/10.1016/j.cja.2019.10.005 CrossRef Google Scholar

Peters, J. and Schaal, S. Policy gradient methods for robotics, International Conference on Intelligent Robots and Systems, 2007. https://doi.org/10.1109/IROS.2006.282564 CrossRef Google Scholar

Babuska, R., Busoniu, L., and Schutter, B.D. Reinforcement learning for multi-agent systems, Proceedings of the11th International Conference on Emerging Technologies and Factory Automation (ETFA 2006), IEEE, Prague, Czech Republic, 2006. http://www.dcsc.tudelft.nl Google Scholar

Nguyen, T.T., Nguyen, N.D., and Nahavandi, S. Deep reinforcement learning for multi-agent systems: A review of challenges, solutions and applications, 2019. arXiv: 1812.11794v2.Google Scholar

Li, C.G., Wang, M. and Yuan, Q.N. A mulit-agent reinforcement learning using actor-critic methods, Proceedings of the 7th International Conference on Machine Learning and Cybernetics 2008. https://doi.org/10.1109/ICMLC.2008.4620528 CrossRef Google Scholar

Gupta, J.K., Egorov, M. and Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Sukthankar, G. and Rodriguez-Aguilar, J. (eds.) International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017), Lecture Notes in Computer Science, 106(42), Springer, Cham, 2017, pp 66–83. https://doi.org/10.1007/978-3-319-71682-4_5 CrossRef Google Scholar

Guo, H.L. and Meng, Y. Distributed reinforcement learning for coordinate multi-robot foraging, J. Intell. Robot Syst., 2010, 60, pp 531–551. https://doi.org/10.1007/s10846-010-9429-4 CrossRef Google Scholar

Lowe, R., Wu, Y., Tamar, A., et al. Multi-agent actor-critic for mixed cooperative-competitive environments, Proceedings of the Neural Information Processing Systems (NIPS 2017). arXiv:1706.02275v3.Google Scholar

Lillicrap, T.P., Hunt, J.J., Pritzel, A., et al. Continuous control with deep reinforcement learning, International Conference on Learning Representations, 2015, pp 1–14. https://doi.org/10.1016/S1098-3015(10)67722-4 Google Scholar

Nagabandi, A., Kahn, G., Fearing, R.S., et al. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning, 2017. arXiv: 1708.02596v2.CrossRef Google Scholar

Yang, Z., Merrick, K., Abbass, H., et al. Multi-task deep reinforcement learning for continuous action control, Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2017, pp 3301–3307. https://doi.org/10.24963/ijcai.2017/461 CrossRef Google Scholar

Baker, B., Gupta, O., Naik, N., et al. Designing neural network architectures using reinforcement learning, International Conference on Learning Representations, 2017. arXiv: 1611.02167v2Google Scholar

Liu, Q.H., Liu, X.F. and Cai, G.P. Control with distributed deep reinforcement learning: Learn a better policy, 2018. arXiv: 1811.10264v2.Google Scholar

Goecks, V.G., Leal, P.B., White, T., et al. Control of morphing wing shapes with deep reinforcement learning, 2018 AIAA Information Systems-AIAA Infotech @ Aerospace, Kissimmee, Florida, January 2018. https://doi.org/10.2514/6.2018-2139 Google Scholar

Wen, N., Liu, Z.H., Zhu, L.P., et al. Deep reinforcement learning and its application on autonomous shape optimization for morphing aircrafts, J. Astronaut., 2017, 38, pp 1153–1159. https://doi.org/10.3873/j.issn.1000-1328.2017.11.003 Google Scholar

Xu, D., Hui, Z., Liu, Y.Q., et al. Morphing control of a new bionic morphing UAV with deep reinforcement learning, Aerosp. Sci. Technol., 2019, 92, pp 232–243. https://doi.org/10.1016/j.ast.2019.05.058 CrossRef Google Scholar

La, H.M. Multi-robot swarm for cooperative scalar field mapping, Handbook of Research on Design, Control, and Modeling of Swarm Robotics, 2015. https://doi.org/10.4018/978-1-4666-9572-6.ch014 CrossRef Google Scholar

La, H.M., Sheng, W. and Chen, J. Cooperative and active sensing in mobile sensor networks for scalar field mapping, IEEE Trans. Syst. Man Cybern. Syst., 2015, 45(1), pp 1–12. https://doi.org/10.1109/TSMC.2014.2318282 CrossRef Google Scholar

Adepegba, A.A., Miah, S. and Spinello, D. Multi-agent area coverage control using reinforcement learning, Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, 2016, pp 368–373. http://dx.doi.org/10.20381/ruor-5715 CrossRef Google Scholar

Pham, H.X., La, H.M., Feil-Seifer, D., et al. Cooperative and distributed reinforcement learning of drones for field coverage, 2018. arXiv: 1803.07250v1.Google Scholar

Xu and Chen supplementary material

File 3.7 MB

Article contents

Autonomous and cooperative control of UAV cluster with multi-agent reinforcement learning

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Xu and Chen supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests