Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-18T13:11:50.288Z Has data issue: false hasContentIssue false

Experimental Study of Reinforcement Learning in Mobile Robots Through Spiking Architecture of Thalamo-Cortico-Thalamic Circuitry of Mammalian Brain

Published online by Cambridge University Press:  18 November 2019

Vahid Azimirad*
Affiliation:
Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran E-mail: [email protected]
Mohammad Fattahi Sani
Affiliation:
Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran E-mail: [email protected]
*
*Corresponding author. E-mail: [email protected]

Summary

In this paper, the behavioral learning of robots through spiking neural networks is studied in which the architecture of the network is based on the thalamo-cortico-thalamic circuitry of the mammalian brain. According to a variety of neurons, the Izhikevich model of single neuron is used for the representation of neuronal behaviors. One thousand and ninety spiking neurons are considered in the network. The spiking model of the proposed architecture is derived and prepared for the learning problem of robots. The reinforcement learning algorithm is based on spike-timing-dependent plasticity and dopamine release as a reward. It results in strengthening the synaptic weights of the neurons that are involved in the robot’s proper performance. Sensory and motor neurons are placed in the thalamus and cortical module, respectively. The inputs of thalamo-cortico-thalamic circuitry are the signals related to distance of the target from robot, and the outputs are the velocities of actuators. The target attraction task is used as an example to validate the proposed method in which dopamine is released when the robot catches the target. Some simulation studies, as well as experimental implementation, are done on a mobile robot named Tabrizbot. Experimental studies illustrate that after successful learning, the meantime of catching target is decreased by about 36%. These prove that through the proposed method, thalamo-cortical structure could be trained successfully to learn to perform various robotic tasks.

Type
Articles
Copyright
© Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Murray Sherman, S., “Thalamus,” Scholarpedia 1(9), 1583 (2006).CrossRefGoogle Scholar
Grossberg, S. and Versace, M., “Spikes, synchrony, and attentive learning by laminar thalamocortical circuits,” Brain Res. 1218, 278312 (2008).CrossRefGoogle ScholarPubMed
Chersi, F., Mirolli, M., Pezzulo, G. and Baldassarre, G., “A spiking neuron model of the cortico-basal ganglia circuits for goal-directed and habitual action learning,” Neural Networks 41, 212224 (2013).CrossRefGoogle ScholarPubMed
Andrés Chalita, M., Lis, D. and Caverzasi, A., “Reinforcement learning in a bio-connectionist model based in the thalamo-cortical neural circuit,” Biolog. Ins. Cogn. Arch. 16, 4563 (2016).Google Scholar
Stewart, T. C., Bekolay, T. and Eliasmith, C., “Learning to select actions with spiking neurons in the basal ganglia,” Front. Neurosci. 6, 2 (2012).CrossRefGoogle ScholarPubMed
Shteingart, H. and Loewenstein, Y., “Reinforcement learning and human behavior,” Curr. Opinion Neurobiol. 25, 9398 (2014).CrossRefGoogle ScholarPubMed
Maia, T. V., “Reinforcement learning, conditioning, and the brain: Successes and challenges,” Cogn. Affect. Behav. Neurosci. 9(4), 343364 (2009).CrossRefGoogle ScholarPubMed
Balleine, B. W., Morris, R. W. and Leung, B. K., “Thalamocortical integration of instrumental learning and performance and their disintegration in addiction,” Brain Res. 1628(A), 104116 (2015).CrossRefGoogle ScholarPubMed
Tanaka, Y. H., Tanaka, Y. R., Kondo, M., Terada, S.-I., Kawaguchi, Y. and Matsuzaki, M., “Thalamocortical axonal activity in motor cortex exhibits layer-specific dynamics during motor learning,” Neuron 100(1), 244258 (2018).CrossRefGoogle ScholarPubMed
Izhikevich, E. M., “Which model to use for cortical spiking neurons?,” IEEE Trans. Neural Networks 15(5), 10631070 (2004).CrossRefGoogle ScholarPubMed
Breakspear, M., “Dynamic models of large-scale brain activity,” Nature Neurosci. 20(3), 340 (2017).CrossRefGoogle ScholarPubMed
Sarim, M., Schultz, T., Kumar, M. and Jha, R., “An Artificial Brain Mechanism to Develop a Learning Paradigm for Robot Navigation,” ASME 2016 Dynamic Systems and Control Conference (American Society of Mechanical Engineers, 2016) pp. V001T03A004–V001T03A004.Google Scholar
Izhikevich, E. M. and Edelman, G. M., “Large-scale model of mammalian thalamocortical systems,” Proc. Nat. Acad. Sci. 105(9), 35933598 (2008).CrossRefGoogle ScholarPubMed
Izhikevich, E. M., “Solving the distal reward problem through linkage of STDP and dopamine signaling,” Cerebral Cortex 17(10), 24432452 (2007).CrossRefGoogle ScholarPubMed
Elibol, R. and Şengör, N. S., “Building neurocomputational models at different levels for basal ganglia circuit,” Istanbul Univ. J. Elect. Electron. Eng. 17(1), 31373146 (2017).Google Scholar
Erçelik, E. and Şengör, N. S., “A Neurocomputational Model Implemented on Humanoid Robot for Learning Action Selection,” 2015 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2015) pp. 16.CrossRefGoogle Scholar
Kober, J., Andrew Bagnell, J. and Peters, J., “Reinforcement learning in robotics: A survey,” Int. J. Robot. Res. 32(11), 12381274 (2013).CrossRefGoogle Scholar
Miljković, Z., Mitić, M., Lazarević, M. and Babić, B., “Neural network reinforcement learning for visual control of robot manipulators,” Expert Syst. Appl. 40(5), 17211736 (2013).CrossRefGoogle Scholar
Bing, Z., Meschede, C., Röhrbein, F., Huang, K. and Knoll, A. C., “A survey of robotics control based on learning-inspired spiking neural networks,” Front. Neurorobot. 12, 35 (2018).CrossRefGoogle ScholarPubMed
Khamassi, M., Lallée, S., Enel, P., Procyk, E. and Dominey, P. F.Robot cognitive control with a neurophysiologically inspired reinforcement learning model,” Front. Neurorobot. 5(1), 13 (2011).CrossRefGoogle ScholarPubMed
Prescott, T. J., Montes González, F. M., Gurney, K., Humphries, M. D. and Redgrave, P., “A robot model of the basal ganglia: Behavior and intrinsic processing,” Neural Networks 19(1), 3161 (2006).CrossRefGoogle ScholarPubMed
Long, L. and Fang, G., “A Review of Biologically Plausible Neuron Models for Spiking Neural Networks,” In: AIAA Infotech@ Aerospace 2010 (2010) p. 3540.Google Scholar
Burrafato, M. and Florio, L., “A cognitive architecture based on an amygdala thalamo cortical model for developing new goals and behaviors: application in humanoid robotics,” Master’s thesis (Politecnico di Milano, 2012).Google Scholar
Bhattacharya, B. S., Patterson, C., Galluppi, F., Durrant, S. J. and Furber, S., “Engineering a thalamo-cortico-thalamic circuit on spinnaker: A preliminary study toward modeling sleep and wakefulness,” Front. Neural Circ. 8, 46 (2014).Google ScholarPubMed
Bhattacharya, B. S., Coyle, D. and Maguire, L. P., “A thalamo–cortico–thalamic neural mass model to study alpha rhythms in alzheimers disease,” Neural Networks 24(6), 631645 (2011).CrossRefGoogle Scholar
Igarashi, J., Shouno, O., Fukai, T. and Tsujino, H., “Real-time simulation of a spiking neural network model of the basal ganglia circuitry using general purpose computing on graphics processing units,” Neural Networks 24(9), 950960 (2011).CrossRefGoogle ScholarPubMed
Humphries, M. D., Stewart, R. D. and Gurney, K. N., “A physiologically plausible model of action selection and oscillatory activity in the basal ganglia,” J. Neurosci. 26(50), 1292112942 (2006).CrossRefGoogle ScholarPubMed
Gurney, K., Prescott, T. J. and Redgrave, P., “A computational model of action selection in the basal ganglia. i. a new functional anatomy,” Biolog. Cybern. 84(6), 401410 (2001).CrossRefGoogle ScholarPubMed
Shouno, O., Takeuchi, J. and Tsujino, H., “A Spiking Neuron Model of the Basal Ganglia Circuitry that can Generate Behavioral Variability,” In: The Basal Ganglia IX (Groenewegen, H. J., Voorn, P., Berendse, H. W., Mulder, A. B. and Cools, A. R., eds.) (Springer, New York, 2009) pp. 191200.CrossRefGoogle Scholar
Cao, Z., Cheng, L., Zhou, C., Gu, N., Wang, X. and Tan, M., “Spiking neural network-based target tracking control for autonomous mobile robots,” Neural Comput. Appl. 26(8), 18391847 (2015).CrossRefGoogle Scholar
Arena, P., De Fiore, S., Patané, L., Pollino, M. and Ventura, C., “Insect Inspired Unsupervised Learning for Tactic and Phobic Behavior Enhancement in a Hybrid Robot,” The 2010 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2010) pp. 18.CrossRefGoogle Scholar
Bouganis, A. and Shanahan, M., “Training a Spiking Neural Network to Control a 4-DOF Robotic Arm based on Spike Timing-Dependent Plasticity,” The 2010 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2010) pp. 18.CrossRefGoogle Scholar
Nadjib Zennir, M., Benmohammed, M. and Boudjadja, R., “Spike-Time Dependant Plasticity in a Spiking Neural Network for Robot Path Planning,” AIAI Workshops (2015) pp. 213.Google Scholar
Azimirad, V., Sani, M. F. and Ramezanlou, M. T., “Unsupervised Learning of Target Attraction for Robots Through Spike Timing Dependent Plasticity,” 2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI) (IEEE, 2017) pp. 04280433.CrossRefGoogle Scholar
Nichols, E., McDaid, L. J. and Siddique, N. H., “Case study on a self-organizing spiking neural network for robot navigation,” Int. J. Neural Syst. 20(06), 501508 (2010).CrossRefGoogle ScholarPubMed
Batllori, R., Laramee, C. B., Land, W. and David Schaffer, J., “Evolving spiking neural networks for robot control,” Procedia Comput. Sci. 6, 329334 (2011).CrossRefGoogle Scholar
Cyr, A. and Boukadoum, M., “Classical conditioning in different temporal constraints: An STDP learning rule for robots controlled by spiking neural networks,” Adapt. Behav. 20(4), 257272 (2012).CrossRefGoogle Scholar
Zhang, X., Xu, Z., Henriquez, C. and Ferrari, S., “Spike-Based Indirect Training of a Spiking Neural Network-Controlled Virtual Insect,” 52nd IEEE Conference on Decision and Control (IEEE, 2013) pp. 67986805.CrossRefGoogle Scholar
Nichols, E., McDaid, L. J. and Siddique, N., “Biologically inspired SNN for robot control,” IEEE Trans. Cybern. 43(1), 115128 (2012).CrossRefGoogle ScholarPubMed
Mazumder, P., Hu, D., Ebong, I., Zhang, X., Xu, Z. and Ferrari, S., “Digital implementation of a virtual insect trained by spike-timing dependent plasticity,” Integration 100(54), 109117 (2016).CrossRefGoogle Scholar
Masuta, H. and Kubota, N., “Learnablity of a spiking neural network for perception of a partner robot,” 2008 IEEE International Conference on Systems, Man and Cybernetics (IEEE, 2008) pp. 14131418.CrossRefGoogle Scholar
Hagras, H., Pounds-Cornish, A., Colley, M., Callaghan, V. and Clarke, G., “Evolving Spiking Neural Network Controllers for Autonomous Robots,” IEEE International Conference on Robotics and Automation. ICRA’04., vol, 5 (IEEE, 2004) pp. 46204626.CrossRefGoogle Scholar
Alnajjar, F. and Murase, K., “A simple aplysia-like spiking neural network to generate adaptive behavior in autonomous robots,” Adaptive Behavior 16(5), 306324 (2008).CrossRefGoogle Scholar
Takase, N., Botzheim, J. and Kubota, N., “Evolving Spiking Neural Network for Robot Locomotion Generation,” 2015 IEEE Congress on Evolutionary Computation (CEC) (IEEE, 2015) pp. 558565.CrossRefGoogle Scholar
Oniz, Y. and Kaynak, O., “Control of a direct drive robot using fuzzy spiking neural networks with variable structure systems-based learning algorithm,” Neurocomputing 149(PB), 690699 (2015).CrossRefGoogle Scholar
Wang, X., Hou, Z.-G., Zou, A., Tan, M. and Cheng, L., “A behavior controller based on spiking neural networks for mobile robots,” Neurocomputing 71(4–6), 655666 (2008).CrossRefGoogle Scholar
Singh, N., Huyck, C. R., Gandhi, V. and Jones, A., “Neuron-based control mechanisms for a robotic arm and hand,” Int. J. Comput. Elect. Auto. Control Inf. Eng. 11(2), 221229 (2017).Google Scholar
Wang, X., Hou, Z.-G., Tan, M., Wang, Y. and Wang, X., “Corridor-Scene Classification for Mobile Robot Using Spiking Neurons,” 2008 Fourth International Conference on Natural Computation, vol. 4 (IEEE, 2008) pp. 125129.CrossRefGoogle Scholar
Wang, X., Hou, Z.-G., Tan, M., Wang, Y. and Hu, L., “The Wall-Following Controller for the Mobile Robot Using Spiking Neurons,” 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 1 (IEEE, 2009) pp. 194199.CrossRefGoogle Scholar
Wang, X., Hou, Z.-G., Lv, F., Tan, M. and Wang, Y., “Mobile robots modular navigation controller using spiking neural networks,” Neurocomputing 134, 230238 (2014).CrossRefGoogle Scholar
Helgadottir, L. I., Haenicke, J., Landgraf, T., Rojas, R. and Nawrot, M. P., “Conditioned Behavior in a Robot Controlled by a Spiking Neural Network,” 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER) (IEEE, 2013) pp. 891894.CrossRefGoogle Scholar
Dumesnil, E., Beaulieu, P.-O. and Boukadoum, M., “Robotic Implementation of Classical and Operant Conditioning as a Single STDP Learning Process,” 2016 International Joint Conference on Neural Networks (IJCNN) (IEEE, 2016) pp. 52415247.CrossRefGoogle Scholar
Dura-Bernal, S., Chadderdon, G. L., Neymotin, S. A., Francis, J. T. and Lytton, W. W., “Towards a real-time interface between a biomimetic model of sensorimotor cortex and a robotic arm,” Pattern Recogn. Lett. 36, 204212 (2014).CrossRefGoogle Scholar
Izhikevich, E. M., “Simple model of spiking neurons,” IEEE Trans. Neural Networks 14(6), 15691572 (2003).CrossRefGoogle ScholarPubMed
Schultz, W., “Predictive reward signal of dopamine neurons,” J. Neurophysiology 80(1), 127 (1998).CrossRefGoogle ScholarPubMed
Chorley, P. and Seth, A. K., “Closing the Sensory-Motor Loop on Dopamine Signalled Reinforcement Learning,” International Conference on Simulation of Adaptive Behavior (Springer, 2008) pp. 280290.CrossRefGoogle Scholar
Neymotin, S. A., Chadderdon, G. L., Kerr, C. C., Francis, J. T. and Lytton, W. W., “Reinforcement learning of two-joint virtual arm reaching in a computer model of sensorimotor cortex,” Neural Comput. 25(12), 32633293 (2013).CrossRefGoogle Scholar
Ursino, M., Cona, F. and Zavaglia, M., “The generation of rhythms within a cortical region: Analysis of a neural mass model,” NeuroImage 52(3), 10801094 (2010).CrossRefGoogle ScholarPubMed
Yonk, A. J. and Margolis, D. J., “Traces of learning in thalamocortical circuits,” Neuron 103(2), 175176 (2019).CrossRefGoogle ScholarPubMed
Takashima, Y., Scanziani, M., Conner, J. M., Biane, J. S. and Tuszynski, M. H., “Thalamocortical projections onto behaviorally relevant neurons exhibit plasticity during adult motor learning,” Neuron 89(6), 11731179 (2016).Google Scholar