Human–agent transfer from observations

Bikramjit Banerjee; Sneha Racharla

doi:10.1017/S0269888920000387

Human–agent transfer from observations

Published online by Cambridge University Press: 27 November 2020

Bikramjit Banerjee

and

Sneha Racharla

Show author details

Bikramjit Banerjee: Affiliation:
The University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS39406, USA e-mails: [email protected], [email protected]
Sneha Racharla: Affiliation:
The University of Southern Mississippi, 118 College Drive #5106, Hattiesburg, MS39406, USA e-mails: [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Learning from human demonstration (LfD), among many speedup techniques for reinforcement learning (RL), has seen many successful applications. We consider one LfD technique called human–agent transfer (HAT), where a model of the human demonstrator’s decision function is induced via supervised learning and used as an initial bias for RL. Some recent work in LfD has investigated learning from observations only, that is, when only the demonstrator’s states (and not its actions) are available to the learner. Since the demonstrator’s actions are treated as labels for HAT, supervised learning becomes untenable in their absence. We adapt the idea of learning an inverse dynamics model from the data acquired by the learner’s interactions with the environment and deploy it to fill in the missing actions of the demonstrator. The resulting version of HAT—called state-only HAT (SoHAT)—is experimentally shown to preserve some advantages of HAT in benchmark domains with both discrete and continuous actions. This paper also establishes principled modifications of an existing baseline algorithm—called A3C—to create its HAT and SoHAT variants that are used in our experiments.

Type: Research Article
Information: The Knowledge Engineering Review , Volume 36 , 2021 , e2

DOI: https://doi.org/10.1017/S0269888920000387 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Argall, B. D., Chernova, S., Veloso, M. & Browning, B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483.CrossRef Google Scholar

Bojarski, M., Testa, D., et al. 2016. End to end learning for self-driving cars. arXiv preprint .Google Scholar

Chernova, S. & Veloso, M. 2007. Confidence-based policy learning from demonstration using Gaussian mixture models. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), 233, 1–8. ACM.Google Scholar

Daftry, S., Bagnell, J. & Hebert, M. 2016. Learning transferable policies for monocular reactive MAV control. In Proceedings of the International Symposium on Experimental Robotics, 3–11.Google Scholar

Da Silva, F. L. & Reali Costa, A. H. 2019. A survey on transfer learning for multiagent reinforcement learning systems. Journal of Artificial Intelligence Research 64, 645–703.CrossRef Google Scholar

de la Cruz, G. V. Jr, Du, Y. & Taylor, M. E. 2017. Pre-training Neural Networks with Human Demonstrations for Deep Reinforcement Learning. arXiv preprint .Google Scholar

Fernandez, F., Garcia, J. & Veloso, M. 2010. Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems 58(7), 866–871.Google Scholar

Giusti, A., Guzzi, J., et al. 2016. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661–667.CrossRef Google Scholar

Ho, J. & Ermon, S. 2016. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, 4565–4573.Google Scholar

Jain, V., Doshi, P. & Banerjee, B. 2019. Model-free IRL using maximum likelihood estimation. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, 3951–3958.Google Scholar

Judah, K., Fern, A. & Dietterich, T. G. 2012. Active imitation learning via reduction to I.I.D. active learning. In Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI), 428–437.Google Scholar

Karakovskiy, S. & Togelius, J. 2012. The Mario AI benchmark and competitions. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 55–67.CrossRef Google Scholar

Kingma, D. P. & Ba, J. 2015. Adam: a method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations.Google Scholar

Kolter, J. Z., Abbeel, P. & Ng, A. Y. 2008. Hierarchical apprenticeship learning with application to quadruped locomotion. In Advances in Neural Information Processing Systems (NIPS), 769–776.Google Scholar

Liu, Y., Gupta, A., Abbeel, P. & Levine, S. 2018. Imitation from observation: learning to imitate behaviors from raw video via context translation. In Proceedings of the International Conference on Robotics and Automation (ICRA-18).CrossRef Google Scholar

Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. & Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of The 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research 48, PMLR, New York, New York, USA, Balcan, M. F. and Weinberger, K. Q. (eds), 1928–1937.Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518, 529–533.CrossRef Google Scholar PubMed

Niekum, S., Osentoski, S., Konidaris, G., Chitta, S., Marthi, B. & Barto, A. G. 2015. Learning grounded finite-state representations from unstructured demonstrations. International Journal of Robotics Research 34(2), 131–157.Google Scholar

Ramachandran, D. & Amir, E. 2007. Bayesian inverse reinforcement learning. In Proceedings of the International Joint Conference on Artificial Intelligence, 2586–2591.Google Scholar

Ross, S., Gordon, G. & Bagnell, J. 2011. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTAT), 627–635.Google Scholar

Russell, S. 1998. Learning agents for uncertain environments (extended abstract). In Eleventh Annual Conference on Computational Learning Theory, 101–103.Google Scholar

Schaal, S. 1997. Learning from demonstration. In Advances in Neural Information Processing Systems (NIPS), 1040–1046.Google Scholar

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489.CrossRef Google Scholar PubMed

Subramanian, K., Isbell, C. L. Jr & Thomaz, A. L. 2016. Exploration from demonstration for interactive reinforcement learning. In Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 447–456.Google Scholar

Sutton, R. & Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.Google Scholar

Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12. MIT Press, 1057–1063.Google Scholar

Tamassia, M., Zambetta, F., Raffe, W., Mueller, F. & Li, X. 2017. Learning options from demonstrations: A Pac-Man case study. IEEE Transactions on Computational Intelligence and AI in Games 10(1), 91–96.CrossRef Google Scholar

Taylor, M. E. & Stone, P. 2009. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research 10(1), 1633–1685.Google Scholar

Taylor, M. E., Suay, H. B. & Chernova, S. 2011. Integrating reinforcement learning with human demonstrations of varying ability. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).Google Scholar

Torabi, F., Warnell, G. & Stone, P. 2018. Behavioral cloning from observation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), 4950–4957.Google Scholar

Uchibe, E. 2018. Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters 47(3), 891–905.CrossRef Google Scholar

VRoman, M. C. 2014. Maximum Likelihood Inverse Reinforcement Learning. PhD thesis, Rutgers University.Google Scholar

Walsh, T. J., Hewlett, D. K. & Morrison, C. T. 2011. Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems (NIPS), 2258–2266.Google Scholar

Wang, Z. & Taylor, M. E. 2017. Improving reinforcement learning with confidence-based demonstrations. In Proceedings of the 26th International Conference on Artificial Intelligence (IJCAI).CrossRef Google Scholar

Wang, Z. & Taylor, M. E. 2019. Interactive reinforcement learning with dynamic reuse of prior knowledge from human and agent demonstrations. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI-19), 3820–3827.Google Scholar

Williams, R. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8(3–4), 229–256.CrossRef Google Scholar

Ziebart, B. D., Maas, A., Bagnell, J. A. & Dey, A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 1433–1438.Google Scholar

Article contents

Human–agent transfer from observations

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests