Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Xiongqing Liu; Yan Jin

doi:10.1017/S0890060420000141

Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Published online by Cambridge University Press: 16 March 2020

Xiongqing Liu and

Yan Jin

Show author details

Xiongqing Liu: Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
Yan Jin*: Affiliation:
Department of Aerospace and Mechanical Engineering, University of Southern California, 3650 McClintock Avenue, OHE-430, Los Angeles, CA90089-1453, USA
*: Author for correspondence: Yan Jin, E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

Keywords

Agent-based systems autonomous vehicle collision avoidance deep reinforcement learning machine learning

Type: Research Article
Information: AI EDAM , Volume 34 , Special Issue 2: Thematic Collection: Design Cognition and Computing , May 2020 , pp. 207 - 222

DOI: https://doi.org/10.1017/S0890060420000141 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alonso-Mora, J, Breitenmoser, A, Rufli, M, Beardsley, P and Siegwart, R (2013) Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Martinoli, A. et al. (eds) Distributed Autonomous Robotic Systems. Berlin, Heidelberg: Springer, pp. 203–216.CrossRef Google Scholar

Arnold, A, Nallapati, R and Cohen, W (2007) A comparative study of methods for transductive transfer learning. Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007), 31 March 2008. Omaha, NE, USA: IEEE Computer Society.10.1109/ICDMW.2007.109CrossRef Google Scholar

Bahadori, M, Liu, Y and Zhang, D (2014) A general framework for scalable transductive transfer learning. Knowledge and Information Systems 38, 61–83.10.1007/s10115-013-0647-5CrossRef Google Scholar

Bertsekas, DP and Tsitsiklis, JN (1996) Neuro-Dynamic Programming. MIT Press.Google Scholar

Bojarski, M, Del Testa, D, Dworakowski, D, Firner, B, Flepp, B, Goyal, P, Jackel, LD, Monfort, M, Muller, U, Zhang, J, Zhang, X, Zhao, J and Zieba, K (2016) End to end learning for self-driving cars. arXiv: 1604.07316 [cs.LG].Google Scholar

Brunn, P (1996) Robot collision avoidance. Industrial Robot: An International Journal 23, 27–33.CrossRef Google Scholar

Casanova, D, Tardioli, C and Lemaître, A (2014) Space debris collision avoidance using a three-filter sequence. Monthly Notices of the Royal Astronomical Society 442, 3235–3242.CrossRef Google Scholar

Chen, JX (2016) The evolution of computing: AlphaGo. Computing in Science & Engineering 18, 4–7.10.1109/MCSE.2016.74CrossRef Google Scholar

Chen, YF, Liu, M, Everett, M and How, JP (2016) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. arXiv: 1609.07845 [cs.MA].CrossRef Google Scholar

Churchland, PS and Sejnowski, TJ (2016) The Computational Brain. Chambridge, MA, USA: MIT Press.10.7551/mitpress/11207.001.0001CrossRef Google Scholar

Coates, A, Huval, B, Wang, T, Wu, D and Ng, A (2013) Deep learning with COTS HPC systems. Proceedings of the 30th International Conference on Machine Learning. PMLR, Vol. 28. pp. 1337–1345.Google Scholar

Dean, J, Corrado, G, Monga, R, Kai, C, Devin, M, Mao, M, Ranzato, M, Senior, A, Tucker, P, Yang, K, Le, QV and Ng, AY (2012) Large scale distributed deep networks. NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1. Red Hook, NY, USA: Curran Associates Inc.Google Scholar

Dieleman, S and Schrauwen, B (2014) End-to-end learning for music audio. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York, NY USA.CrossRef Google Scholar

Ding, Z, Nasrabadi, N and Fu, Y (2016) Task-driven deep transfer learning for image classification. IEEE International Conference on Acoustics, Speech and Signal Processing, 4–9 May 2014. Florence, Italy: IEEE.10.1109/ICASSP.2016.7472110CrossRef Google Scholar

Eleftheria, E, Apostolos, P and Markos, V (2016) Statistical analysis of ship accidents and review of safety level. Safety Science 85, 282–292.Google Scholar

Fahimi, F, Nataraj, C and Ashrafiuon, H (2009) Real-time obstacle avoidance for multiple mobile robots. Robotica 27, 189–198.CrossRef Google Scholar

Fernandez, F and Veloso, M (2006) Probabilistic policy reuse in a reinforcement learning agent. 5th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2006), Vol. 58, 8–12 May 2006. Hakodate, Japan, pp. 720–727.10.1145/1160633.1160762CrossRef Google Scholar

Ford, JWW (2009) A Seaman's Guide to the Rule of the Road. Gloucestershire, UK: Morgans Technical Books Limited.Google Scholar

Goerlandt, F and Kujala, P (2014) On the reliability and validity of ship–ship collision risk analysis in light of different perspectives on risk. Safety Science 62, 348–365.CrossRef Google Scholar

Hameed, S and Hasan, O (2016) Towards autonomous collision avoidance in surgical robots using image segmentation and genetic algorithms. 2016 IEEE Region 10 Symposium (TENSYMP), 9–11 May 2016. Bali, Indonesia: IEEE, pp. 266–270.10.1109/TENCONSpring.2016.7519416CrossRef Google Scholar

Hinton, G, Vinyals, O and Dean, J (2015) Distilling the Knowledge in a Neural Network. arXiv. 1503.02531v1 [stat.ML] 9 Mar.Google Scholar

Hourtash, AM, Hingwe, P, Schena, BM and Devengenzo, RL (2016) U.S. Patent No. 9,492,235. Washington, DC: U.S. Patent and Trademark Office.Google Scholar

Jin, Y and Koyama, T (1987) On the design of marine traffic control system (1st report). Journal of the Society of Naval Architects of Japan 162, 183–192.Google Scholar

Keller, J, Thakur, D, Gallier, J and Kumar, V (2016) Obstacle avoidance and path intersection validation for UAS: a B-spline approach. 2016 International Conference on Unmanned Aircraft Systems (ICUAS). Arlington, VA, USA: IEEE, pp. 420–429.10.1109/ICUAS.2016.7502631CrossRef Google Scholar

Khatib, O (1986) Real-time obstacle avoidance for manipulators and mobile robots. The International Journal of Robotics Research 5(1), 90–98.10.1177/027836498600500106CrossRef Google Scholar

Kingma, DP and Ba, J (2015) Adam: A method for stochastic optimization, in Proceedings of ICLR, 2015.Google Scholar

Krizhevsky, A, Sutskever, I and Hinton, G (2012) ImageNet classification with deep convolutional neural networks. Communications of the ACM 60(6), 84–90.CrossRef Google Scholar

Kuderer, M, Gulati, S and Burgard, W (2015) Learning driving styles for autonomous vehicles from demonstration. 2015 IEEE International Conference on Robotics and Automation (ICRA), 26–30 May 2015. Seattle, WA, USA: IEEE, pp. 2641–2646.10.1109/ICRA.2015.7139555CrossRef Google Scholar

Le, Q, Ranzato, M, Monga, R, Devin, M, Chen, K, Corrado, G, Dean, J and Ng, A (2012) Building high-level features using large scale unsupervised learning. International Conference on Machine Learning: arXiv: 1112.6209v5 [cs.LG].Google Scholar

LeCun, Y, Bengio, Y and Hinton, G (2015) Deep learning. Nature 521, 436–444.CrossRef Google Scholar PubMed

Liu, X and Jin, Y (2018) Design of transfer reinforcement learning under low task similarity. ASME 2018 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, IDETC2018-86013, 26–29 August 2018. Quebec City, Quebec, Canada: American Society of Mechanical Engineers Digital Collection.Google Scholar

Machado, T, Malheiro, T, Monteiro, S, Erlhagen, W and Bicho, E (2016) Multi-constrained joint transportation tasks by teams of autonomous mobile robots using a dynamical systems approach. 2016 IEEE International Conference on Robotics and Automation (ICRA), 16–21 May 2016. Stockholm, Sweden: IEEE, pp. 3111–3117.10.1109/ICRA.2016.7487477CrossRef Google Scholar

Mastellone, S, Stipanovic, D, Graunke, C, Intlekofer, K and Spong, M (2008) Formation control and collision avoidance for multi-agent non-holonomic systems: theory and experiments. The International Journal of Robotics Research 27, 107–126.10.1177/0278364907084441CrossRef Google Scholar

Matarić, MJ (1997) Reinforcement learning in the multi-robot domain. In Arkin, RC and Bekey, GA (eds) Robot Colonies. Boston, MA, USA:Springer, pp. 73–83.10.1007/978-1-4757-6451-2_4CrossRef Google Scholar

Mericli, C, Mericli, T and Akin, HL (2010) A reward function generation method using genetic algorithms: a robot soccer case study (extended abstract). Proceeding of 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), Vol. 1–3, 10–14 May 2010, Toronto, Canada.Google Scholar

Mnih, V, Kavukcuoglu, K, Silver, D, Graves, A, Antonoglou, I, Wierstra, D and Riedmiller, M (2013) Playing Atari with deep reinforcement learning. arXiv:1312.5602v1 [cs.LG].Google Scholar

Mukhtar, A, Xia, L and Tang, TB (2015) Vehicle detection techniques for collision avoidance systems: a review. IEEE Transactions on Intelligent Transportation Systems 16, 2318–2338.CrossRef Google Scholar

Ng, AY and Russell, S (2000) Algorithms for inversereinforcement learning, in Proceedings of ICML 2000.Google Scholar

Ohn-Bar, E and Trivedi, MM (2016) Looking at humans in the age of self-driving and highly automated vehicles. IEEE Transactions on Intelligent Vehicles 1, 90–104.10.1109/TIV.2016.2571067CrossRef Google Scholar

Pan, SJ and Yang, Q (2010) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 1345–1359.Google Scholar

Parisotto, E, Ba, J and Salakhutdinov, R (2016) Actor-mimic: deep multitask and transfer reinforcement learning. arXiv:1511.06342v4 [cs.LG] 22 Feb 2016.Google Scholar

Schaul, T, Quan, J, Antonoglou, I and Silver, D (2016) Prioritized experience replay. arXiv:1511.05952v4 [cs.LG] 25 Feb 2016.Google Scholar

Shiomi, M, Zanlungo, F, Hayashi, K and Kanda, T (2014) Towards a socially acceptable collision avoidance for a mobile robot navigating among pedestrians using a pedestrian model. International Journal of Social Robotics 6, 443–455.10.1007/s12369-014-0238-yCrossRef Google Scholar

Silver, D, Schrittwieser, J, Simonyan, K, Antonoglou, I, Huang, A, Guez, A, Hubert, T, Baker, L, Lai, M, Bolton, A, Chen, Y, Lillicrap, T, Hui, F, Sifre, L, van den Driessche, G, Graepel, T and Hassabis, D (2017) Mastering the game of Go without human knowledge. Nature 550, 354–359.CrossRef Google Scholar PubMed

Silver, D, Huang, A, Maddison, CJ, Guez, A, Sifre, L, van den Driessche, G, Schrittwieser, J, Antonoglou, I, Panneershelvam, V, Lanctot, M, Dieleman, S, Grewe, D, Nham, J, Kalchbrenner, N, Sutskever, I, Lillicrap, T, Leach, M, Kavukcuoglu, K, Graepel, T and Hassabis, D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529, 484CrossRef Google Scholar PubMed

Sutton, RS and Barto, AG (2018) Reinforcement Learning: An Introduction. Cambridge MA, USA: MIT Press.Google Scholar

Szepesvari, C (2010) Algorithms for Reinforcement Learning. Morgan & Claypool Publishers.10.2200/S00268ED1V01Y201005AIM009CrossRef Google Scholar

Tang, S and Kumar, V (2015) A complete algorithm for generating safe trajectories for multi-robot teams. In: Bicchi A., Burgard W. (eds) Robotics Research. Springer Proceedings in Advanced Robotics, vol 3. New York, NY, USA: Springer, pp 599–616.Google Scholar

Taylor, M and Stone, P (2007) Cross-domain transfer for reinforcement learning. ICML '07: Proceedings of the 24th International Conference on Machine Learning. ACM, pp. 879–886. June 2007, Corvalis, OR, USACrossRef Google Scholar

Torrey, L, Shavlik, J, Walker, T and Maclin, R (2006) Skill acquisition via transfer learning and advice taking. European Conference on Machine Learning. Berlin, Heidelberg: Springer, pp. 425–436.Google Scholar

van Hasselt, H, Guez, A and Silver, D (2015) Deep reinforcement learning with double Q-learning. arXiv:1509.06461v3 [cs.LG].Google Scholar

Wang, FY, Zhang, JJ, Zheng, X, Wang, X, Yuan, Y, Dai, X, Zhang, J and Yang, L (2016 a) Where does AlphaGo go: from Church-Turing thesis to AlphaGo thesis and beyond. IEEE/CAA Journal of Automatica Sinica 3, 113–120.Google Scholar

Wang, Z, School, T, Hessel, M, van Haselt, H, Lanctot, M and de Freitas, N (2016 b) Dueling network architectures for deep reinforcement learning. arXiv:1511.06581v3 [cs.LG] 5 Apr.Google Scholar

Watkins, CJCH (1989) Learning from delayed rewards (Doctoral dissertation). Cambridge University, Cambridge University Press, Cambrdige, UK.Google Scholar

Yang, IB, Na, SG and Heo, H (2017) Intelligent algorithm based on support vector data description for automotive collision avoidance system. International Journal of Automotive Technology 18, 69–77.CrossRef Google Scholar

Yosinski, J, Clune, J, Bengio, Y and Lipson, H (2014) How transferrable are features in deep neural networks? NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2. Cambridge, MA, USA: MIT Press.Google Scholar

Yu, A, Palefsky-Smith, R and Bedi, R (2016) Deep Reinforcement Learning for Simulated Autonomous Vehicle Control. Course Project Reports: Winter 2016 (CS23 1n: Convolutional Neural Networks for Visual Recognition). Stanford University, pp. 1–7.Google Scholar

Zou, X, Alexander, R and McDermid, J (2016) On the validation of a UAV collision avoidance system developed by model-based optimization: challenges and a tentative partial solution. 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W), 28 June–1 July 2016. Toulouse, France: IEEE, pp. 192–199.CrossRef Google Scholar

Article contents

Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests