Speed adaptation for self-improvement of skills learned from user demonstrations

Rok Vuga; Bojan Nemec; Aleš Ude

doi:10.1017/S0263574715000405

Speed adaptation for self-improvement of skills learned from user demonstrations

Published online by Cambridge University Press: 15 June 2015

Rok Vuga ,

Bojan Nemec and

Aleš Ude

Show author details

Rok Vuga*: Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: [email protected], [email protected]
Bojan Nemec: Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: [email protected], [email protected]
Aleš Ude: Affiliation:
Humanoid and Cognitive Robotics Lab, Department of Automatics, Biocybernetics and Robotics, Jožef Stean Institute, Jamova cesta 39, 1000 Ljubljana, Slovenia. E-mails: [email protected], [email protected]
*: *Corresponding author. E-mail: [email protected]

Article contents

Summary
Footnotes
References

Get access

Rights & Permissions

Summary

The paper addresses the problem of speed adaptation of movements subject to environmental constraints. Our approach relies on a novel formulation of velocity profiles as an extension of dynamic movement primitives (DMP). The framework allows for compact representation of non-uniformly accelerated motion as well as simple modulation of the movement parameters. In the paper, we evaluate two model free methods by which optimal parameters can be obtained: iterative learning control (ILC) and policy search based reinforcement learning (RL). The applicability of each method is discussed and evaluated on two distinct cases, which are hard to model using standard techniques. The first deals with hard contacts with the environment while the second process involves liquid dynamics. We find ILC to be very efficient in cases where task parameters can be easily described with an error function. On the other hand, RL has stronger convergence properties and can therefore provide a solution in the general case.

Type: Articles
Information: Robotica , Volume 34 , Issue 12 , December 2016 , pp. 2806 - 2822

DOI: https://doi.org/10.1017/S0263574715000405 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

†

Initial results on the topic were presented at the IEEE-RAS International Conference on Humanoid Robots (Humanoids 2013), Atlanta, Georgia.1

References

1. Nemec, B., Gams, A. and Ude, A., “Velocity Adaptation for Self-Improvement of Skills Learned from User Demonstrations,” Proceedings of IEEE-RAS International Conference on Humanoid Robots, Humanoids 2013, Atlanta, Georgia, USA (2013).CrossRef Google Scholar

2. Wolpert, D. M., Diedrichsen, J. and Flanagan, J. R., “Principles of sensorimotor learning,” Nature Rev. Neurosci. 12 (12), 739–751 (2011).CrossRef Google Scholar PubMed

3. Bentivegna, D. C., Atkeson, C. G. and Cheng, G., “Learning tasks from observation and practice,” Robot. Auton. Syst. 47 (2–3), 163–169 (2004).CrossRef Google Scholar

4. Miyamoto, H., Schaal, S., Gandolfo, F., Gomi, H., Koike, Y., Osu, R., Nakano, E., Wada, Y. and Kawato, M., “A kendama learning robot based on bi-directional theory,” Neural Netw. 9 (8), 1281–1302 (1996).Google Scholar

5. Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Netw. 21 (4), 682–697 (2008).CrossRef Google Scholar PubMed

6. Stulp, F., Theodorou, E. and Schaal, S., “Reinforcement learning with sequences of motion primitives for robust manipulation,” IEEE Trans. Robot. 28 (6), 1360–1370 (2012).Google Scholar

7. Calinon, S., Guenter, F. and Billard, A., “On learning, representing, and generalizing a task in a humanoid robot,” Trans. Syst. Man Cybern. Part B 32 (2), 286–298 (2007).Google Scholar

8. Kormushev, P., Calinon, S. and Caldwell, D. G., “Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input,” Adv. Robot. 25 (5), 581–603 (2011).Google Scholar

9. Buchli, J., Stulp, F., Theodorou, E. and Schaal, S., “Learning variable impedance control,” Int. J. Robot. Res. 30 (7), 820–833 (2011).Google Scholar

10. Hollerbach, J. M., “Dynamic Scaling of Manipulator Trajectories,” American Control Conference (Jun., 1983) pp. 752–756.Google Scholar

11. Bobrow, J. E., Dubowsky, S. and Gibson, J. S., “Time-optimal control of robotic manipulators along specified paths,” Int. J. Robot. Res. 4 (3), 3–17 (1985).CrossRef Google Scholar

12. Shin, K. and McKay, N. D., “Minimum-time control of robotic manipulators with geometric path constraints,” IEEE Trans. Autom. Control 30 (6), 531–541 (Jun. 1985).Google Scholar

13. McCarthy, J. and Bobrow, J., “The Number of Saturated Actuators and Constraint Forces During Time-Optimal Movement of a General Robotic System,” Proceedings of 1992 IEEE International Conference on Robotics and Automation, vol. 1 (May 1992) pp. 542–546.Google Scholar

14. Žlajpah, L., “On Time Optimal Path Control of Manipulators with Bounded Joint Velocities and Torques,” Proceedings of 1996 IEEE International Conference on Robotics and Automation, Minneapolis, Minnesota (1996) pp. 1572–1577.Google Scholar

15. Dahl, O. and Nielsen, L., “Torque Limited Path Following by On-Line Trajectory Time Scaling,” Proceedings of 1989 IEEE International Conference on Robotics and Automation, vol. 2 (May 1989) pp. 1122–1128.Google Scholar

16. Dahl, O., “Path Constrained Robot Control with Limited Torques-Experimental Evaluation,” Proceedings of 1993 IEEE International Conference on Robotics and Automation, vol. 2 (May 1993) pp. 493–498.Google Scholar

17. Kieffer, J., Cahill, A. and James, M., “Robust and accurate time-optimal path-tracking control for robot manipulators,” IEEE Trans. Robot. Autom. 13 (6), 880–890 (Dec. 1997).Google Scholar

18. Akella, S. and Peng, J., “Time-scaled Coordination of Multiple Manipulators,” Proceedings of 2004 IEEE International Conference on Robotics and Automation, ICRA '04, vol. 4 (Apr. 2004) pp. 3337–3344.Google Scholar

19. Michna, V., Wagner, P. and Cernohorsky, J., “Constrained Optimization of Robot Trajectory and Obstacle Avoidance,” Proceedings of 2010 IEEE Conference on Emerging Technologies and Factory Automation (ETFA) (Sep. 2010) pp. 1–4.Google Scholar

20. Zhao, Y. and Tsiotras, P., “Speed Profile Optimization for Optimal Path Tracking,” American Control Conference (ACC) (Jun. 2013) pp. 1171–1176.Google Scholar

21. Ijspeert, A. J., Nakanishi, J., Hoffmann, H., Pastor, P. and Schaal, S., “Dynamical movement primitives: Learning attractor models for motor behaviors,” Neural Comput. 25 (2), 328–373 (2013).CrossRef Google Scholar PubMed

22. Schaal, S., Mohajerian, P. and Ijspeert, A., “Dynamics systems versus optimal control – a unifying view,” Prog. Brain Res. 165 (6), 425–445 (2007).Google Scholar

23. Moore, K. L., Chen, Y. and Ahn, H.-S., “Iterative Learning Control: A Tutorial and Big Picture View,” Proceedings of 2006 45th IEEE Conference on Decision and Control (Dec. 2006) pp. 2352–2357.CrossRef Google Scholar

24. Bristow, D., Tharayil, M. and Alleyne, A., “A survey of iterative learning control,” IEEE Trans. Control Syst. 26 (3), 96–114 (Jun. 2006).Google Scholar

25. Kober, J., Bagnell, D. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Robot. Res. 32 (11), 1238–1274 (2013).Google Scholar

26. Theodorou, E. A., Buchli, J. and Schaal, S., “A generalized path integral control approach to reinforcement learning,” J. Mach. Learn. Res. 11 (11), 3137–3181 (2010).Google Scholar

27. Schreiber, G., Stemmer, A. and Bischoff, R., “The Fast Research Interface for the Kuka Lightweight Robot,” IEEE Workshop on Innovative Robot Control Architectures for Demanding (Research) Applications - How to Modify and Enhance Commercial Controllers (ICRA 2010) (May 2010) pp. 73–77.Google Scholar

28. Ude, A., Nemec, B., Petrič, T. and Morimoto, J., “Orientation in Cartesian Space Dynamic Movement Primitives,” International Conference on Robotics and Automation (ICRA), Hong Kong, China (2014) pp. 2997–3004.Google Scholar

29. Abend, W., Bizzi, E. and Morasso, P., “Human arm trajectory formation,” Brain 105 (2), 331–348 (1982).CrossRef Google Scholar PubMed

30. Uno, Y., Kawato, M. and Suzuki, R., “Formation and control of optimal trajectory in human multijoint arm movement. Minimum torque-change model,” Biol. Cybern. 61 (2), 89–101 (1989).Google Scholar

31. Flash, T. and Hogan, N., “The coordination of arm movements: an experimentally confirmed mathematical model,” J. Neurosci. 5 (7), 1688–1703 (1985).CrossRef Google Scholar PubMed

32. Harris, C. M. and Wolpert, D. M., “Signal-dependent noise determines motor planning,” Nature 394 (6695), 780–784 (1998).Google Scholar

33. Todorov, E. and Jordan, M. I., “Optimal feedback control as a theory of motor coordination,” Nature Neurosci. 5 (11), 1226–1235 (2002).CrossRef Google Scholar PubMed

34. Hogan, N., “Adaptive control of mechanical impedance by coactivation of antagonist muscles,” IEEE Trans. Autom. Control 29 (8), 681–690 (Aug. 1984).Google Scholar

35. Vuga, R., Nemec, B. and Ude, A., “Speed Profile Optimization through Directed Explorative Learning,” IEEE-RAS 14th International Conference on Humanoid Robots, HUMANOIDS (Nov. 2014) pp. 547–553.CrossRef Google Scholar

Article contents

Speed adaptation for self-improvement of skills learned from user demonstrations

Summary

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests