Hostname: page-component-78c5997874-s2hrs Total loading time: 0 Render date: 2024-11-13T11:31:14.013Z Has data issue: false hasContentIssue false

Learn multi-step object sorting tasks through deep reinforcement learning

Published online by Cambridge University Press:  06 May 2022

Jiatong Bao*
Affiliation:
School of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China School of Instrument Science and Engineering, Southeast University, Nanjing 210000, China
Guoqing Zhang
Affiliation:
School of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China
Yi Peng
Affiliation:
School of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China
Zhiyu Shao
Affiliation:
School of Electrical, Energy and Power Engineering, Yangzhou University, Yangzhou 225000, China
Aiguo Song
Affiliation:
School of Instrument Science and Engineering, Southeast University, Nanjing 210000, China
*
*Corresponding author. E-mail: [email protected]

Abstract

Robotic systems are usually controlled to repetitively perform specific actions for manufacturing tasks. The traditional control methods are domain-dependent and model-dependent with cost of much human efforts. They cannot meet the new requirements of generality and flexibility in many areas such as intelligent manufacturing and customized production. This paper develops a general model-free approach to enable robots to perform multi-step object sorting tasks through deep reinforcement learning. Taking projected heightmap images from different time steps as input without extra high-level image analysis and understanding, critic models are designed to produce a pixel-wise Q value map for each type of action. It is a new trial to apply pixel-wise Q value-based critic networks to solve multi-step sorting tasks that involve many types of actions and complex action constraints. The experimental validations on simulated and realistic object sorting tasks demonstrate the effectiveness of the proposed approach. Qualitative results (videos), code for simulated and realistic experiments, and pre-trained models are available at https://github.com/JiatongBao/DRLSorting

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Jia, Y., She, L., Cheng, Y., Bao, J., Chai, J. Y. and Xi, N., “Program Robots Manufacturing Tasks By Natural Language Instructions,” In: IEEE International Conference on Automation Science and Engineering, Fort Worth, TX, USA (2016)   pp. 633638.Google Scholar
Ma, Y., Du, K., Zhou, D., Zhang, J., Liu, X. and Xu, D., “Automatic precision robot assembly system with microscopic vision and force sensor,” Int. J. Adv. Robot. Syst. 16(3), 172988141985161 (2019).CrossRefGoogle Scholar
Rónai, L. and Szabó, T., “Snap-fit assembly process with industrial robot including force feedback,” Robotica 38(2), 317336 (2020).10.1017/S0263574719000614CrossRefGoogle Scholar
Laursen, J. S., Ellekilde, L.-P. and Schultz, U. P., “Modelling reversible execution of robotic assembly,” Robotica 36(5), 625654 (2018).10.1017/S0263574717000613CrossRefGoogle Scholar
Nicola, G., Tagliapietra, L., Tosello, E., Navarin, N., Ghidoni, S. and Menegatti, E., “Robotic Object Sorting Via Deep Reinforcement Learning: A Generalized Approach,” In: IEEE International Conference on Robot and Human Interactive Communication, Naples, Italy (2020) pp. 12661273.Google Scholar
Haarnoja, T., Pong, V. H., Zhou, A., Dalal, M., Abbeel, P. and Levine, S., “Composable Deep Reinforcement Learning for Robotic Manipulation,” In: IEEE International Conference on Robotics and Automation, Brisbane, QLD, Australia (2018) pp. 62446251.Google Scholar
Zeng, A., Song, S., Welker, S., Lee, J., Rodriguez, A. and Funkhouser, T., “Learning Synergies Between Pushing and Grasping with Self-Supervised Deep Reinforcement Learning,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, Spain (2018) pp. 633638.Google Scholar
Kim, D., Li, A. and Lee, J., “Stable robotic grasping of multiple objects using deep neural networks,” Robotica 39(4), 735748 (2021).CrossRefGoogle Scholar
Yang, Y., Ni, Z., Gao, M., Zhang, J. and Tao, D., “Collaborative pushing and grasping of tightly stacked objects via deep reinforcement learning,” IEEE/CAA J. Automat. Sin. 9(1), 135145 (2021).CrossRefGoogle Scholar
Añazco, E. V., Lopez, P. R., Park, N., Oh, J., Ryu, G., Al-antari, M. A. and Kim, T.-S., “Natural object manipulation using anthropomorphic robotic hand through deep reinforcement learning and deep grasping probability network,” Appl. Intell. 51(2), 10411055 (2021).CrossRefGoogle Scholar
Kim, T., Park, Y., Kim, J. B., Park, Y. and Suh, I. H., “Development of an actor-critic deep reinforcement learning platform for robotic grasping in real world,” J. Korea Robot. Soc. 15(2), 197204 (2020).CrossRefGoogle Scholar
Chen, P. and Lu, W., “Deep reinforcement learning based moving object grasping,” Inform. Sci. 565(1), 6276 (2021).CrossRefGoogle Scholar
Tang, B., Corsaro, M., Konidaris, G., Nikolaidis, S. and Tellex, S., “Learning Collaborative Pushing and Grasping Policies in Dense Clutter,” In: International Conference on Robotics and Automation, Xi’an, China (2021) pp. 6177–6184.Google Scholar
Joshi, S., Kumra, S. and Sahin, F., “Robotic Grasping Using Deep Reinforcement Learning,” In: IEEE International Conference on Automation Science and Engineering, Hong Kong, China (2020) pp. 14611466.Google Scholar
Chen, Y., Ju, Z. and Yang, C., “Combining Reinforcement Learning and Rule-Based Method to Manipulate Objects in Clutter,” In: International Joint Conference on Neural Networks, Glassgow, UK (2020) pp. 16.Google Scholar
Hundt, A., Killeen, B., Greene, N., Wu, H., Kwon, H., Paxton, C. and Hager, G., ““Good robot!”: Efficient reinforcement learning for multi-step visual tasks with sim to real transfer,” IEEE Robot. Automat. Lett. 5(4), 67246731 (2020).CrossRefGoogle Scholar
Yang, X., Ji, Z., Wu, J., Lai, Y.-K., Wei, C., Liu, G. and Setchi, R., “Hierarchical reinforcement learning with universal policies for multistep robotic manipulation,” IEEE Trans. Neur. Netw. Learn. Syst., 115 (2021).Google Scholar
Liu, D., Lu, B., Cong, M., Yu, H., Zou, Q. and Du, Y., “Robotic manipulation skill acquisition via demonstration policy learning,” IEEE Trans. Cognit. Develop. Syst. 11 (2021).Google Scholar
Schoettler, G., Nair, A., Luo, J., Bahl, S., Ojea, J., Solowjow, E. and Levine, S., “Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Rewards,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Las Vegas, NV, USA (2020) pp. 55485555.Google Scholar
Li, F., Jiang, Q., Zhang, S., Wei, M. and Song, R., “Robot skill acquisition in assembly process using deep reinforcement learning,” Neurocomputing 345(8), 92102 (2019).CrossRefGoogle Scholar
Luo, J., Solowjow, E., Wen, C., Ojea, J. A., Agogino, A., Tamar, A. and Abbeel, P., “Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly,” In: International Conference on Robotics and Automation, Montreal, QC, Canada (2019) pp. 30803087.Google Scholar
Beltran-Hernandez, C. C., Petit, D., Ramirez-Alpizar, I. G. and Harada, K., “Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach,” Appl. Sci. 10(19), 69236939 (2020).CrossRefGoogle Scholar
Wang, Y., Zhu, S., Zhang, Q., Zhou, R., Dou, R., Sun, H., Yao, Q., Xu, M. and Zhang, Y., “A visual grasping strategy for improving assembly efficiency based on deep reinforcement learning,” J. Sens. 1, 2021–2011 (2021).Google Scholar
Ceola, F., Tosello, E., Tagliapietra, L., Nicola, G. and Ghidoni, S., “Robot Task Planning Via Deep Reinforcement Learning: A Tabletop Object Sorting Application,” In: IEEE International Conference on Systems, Man and Cybernetics, Bari, Italy (2019) pp. 486492.Google Scholar
Mnih, V., Kavukcuoglu, K. and etal, D. S., “Human-level control through deep reinforcement learning,” Nature 518(7540), 529533 (2019).CrossRefGoogle Scholar
Hellaby, W. C. J. C., Learning From Delayed Rewards, PhD Thesis (University of Cambridge, 1989).Google Scholar
Huang, G., Liu, Z. and Weinberger, K. Q., “Densely Connected Convolutional Networks,” In: IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA (2017) pp. 22612269.Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K. and Fei-Fei, L., “Imagenet: A Large-Scale Hierarchical Image Database,” In: IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA (2009) pp. 248255.Google Scholar
Nair, V. and Hinton, G. E., “Rectified Linear Units Improve Restricted Boltzmann Machines,” In: Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel (2010) pp. 807814.Google Scholar
Ioffe, S. and Szegedy, C., “Batch Normalization: Accelerating Deep Network Training By Reducing Internal Covariate Shift,” In: International Conference on Machine Learning, Lille, France (2015) pp. 448456.Google Scholar
Schaul, T., Quan, J., Antonoglou, I. and Silver, D., “Prioritized Experience Replay,” In: International Conference on Learning Representations, Caribe Hilton, San Juan, Puerto Rico (2016).Google Scholar
Rohmer, E., Singh, S. P. N. and Freese, M., “V-rep: A Versatile and Scalable Robot Simulation Framework,” In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan (2013) pp. 13211326.Google Scholar