Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-jn8rn Total loading time: 0 Render date: 2024-12-24T18:38:32.223Z Has data issue: false hasContentIssue false

9 - Deep Learning

from Part II - Cognitive Modeling Paradigms

Published online by Cambridge University Press:  21 April 2023

Ron Sun
Affiliation:
Rensselaer Polytechnic Institute, New York
Get access

Summary

This chapter introduces deep learning (DL) in the framework of experimentalism, taking inspiration from Pierre Oleron’s explanation of human intellectual activities in terms of long (or, deep) circuits. A history of DL is presented, from its origin in the mid-twentieth century to the breakthrough of deep neural networks (DNNs) in the last decades. Architectural and representational issues are then discussed in depth. Convolutional neural networks, the most popular and successful DL algorithm to date, are reviewed in detail. Finally, adaptive activation functions in DNNs are presented in the context of homeostatic neuroplasticity, surveyed, and analyzed.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2023

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M., Barham, P., Chen, J., et al. (2016). Tensorflow: a system for large-scale machine learning. In Keeton, K., & Roscoe, T., (Eds.), In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (pp. 265283). USENIX Association.Google Scholar
Agostinelli, F., Hoffman, M. D., Sadowski, P. J., & Baldi, P. (2015). Learning activation functions to improve deep neural networks. In Bengio, Y. & LeCun, Y., (Eds.), 3rd International Conference on Learning Representations, ICLR 2015, Workshop Track Proceedings.Google Scholar
Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). Deep reinforcement learning: a brief survey. IEEE Signal Processing Magazine, 34(6), 2638.Google Scholar
Bellman, R. (1961). Adaptive Control Processes: A Guided Tour. Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Schölkopf, B., Platt, J., & Hoffman, T., (Eds.), Advances in Neural Information Processing Systems 19 (pp. 153160). Cambridge, MA: MIT Press.Google Scholar
Bengio, Y., & Lecun, Y. (2007). Scaling Learning Algorithms Towards AI. Cambridge, MA: MIT Press.Google Scholar
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157166.Google Scholar
Bianchini, M., Frasconi, P., & Gori, M. (1995). Learning in multilayered networks used as autoassociators. IEEE Transactions on Neural Networks, 6(2), 512515.Google Scholar
Bodyanskiy, Y., Deineko, A., Pliss, I., & Slepanska, V. (2019). Formal neuron based on adaptive parametric rectified linear activation function and its learning. In Kryvinska, N., Izonin, I., Gregus, M., Poniszewska-Maranda, A., & Dronyuk, I., (Eds.), Proceedings of the 1st International Workshop on Digital Content & Smart Multimedia (DCSMart 2019), vol. 2533 of CEUR Workshop Proceedings (pp. 14–22). CEUR-WS.org.Google Scholar
Bohn, B., Griebel, M., & Rieger, C. (2019). A representer theorem for deep kernel learning. Journal of Machine Learning Research, 20, 132.Google Scholar
Boring, E. (1950). A History of Experimental Psychology. New York, NY: Appleton-Century-Crofts.Google Scholar
Castelli, I., & Trentin, E. (2011). Supervised and unsupervised co-training of adaptive activation functions in neural nets. In Schwenker, F., & Trentin, E., (Eds.), Partially Supervised Learning – First IAPR TC3 Workshop, PSL 2011, Revised Selected Papers, vol. 7081 of Lecture Notes in Computer Science (pp. 5261). New York, NY: Springer.Google Scholar
Castelli, I., & Trentin, E. (2014). Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recognition Letters, 37, 178191.Google Scholar
Cho, K., Courville, A., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 18751886.Google Scholar
Clevert, D., Unterthiner, T., & Hochreiter, S. (2016). Fast and accurate deep network learning by exponential linear units (elus). In Bengio, Y., & LeCun, Y., (Eds.), Proceedings of the 4th International Conference on Learning Representations (ICLR, 2016).Google Scholar
Cortes, C., Gonzalvo, X., Kuznetsov, V., Mohri, M., & Yang, S. (2017). AdaNet: adaptive structural learning of artificial neural networks. In Precup, D., & Teh, Y. W., (Eds.), Proceedings of the 34th International Conference on Machine Learning (vol. 70, pp. 874–883).Google Scholar
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. Electronic Computers, IEEE Transactions on, 14 (3), 326–334.Google Scholar
Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2, 303314.Google Scholar
Dasgupta, S., Stevens, C. F., & Navlakha, S. (2017). A neural algorithm for a fundamental computing problem. Science, 358(6364), 793796.Google Scholar
Dauphin, Y. N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., & Bengio, Y. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., & Weinberger, K. Q., (Eds.), Advances in Neural Information Processing Systems, vol. 27. New York, NY: Curran Associates, Inc.Google Scholar
Dechter, R. (1986). Learning while searching in constraint-satisfaction-problems. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 178183.Google Scholar
Delahunt, C. B., Riffell, J. A., & Kutz, J. N. (2018). Biological mechanisms for learning: a computational model of olfactory learning in the manduca sexta moth, with applications to neural nets. Frontiers in Computational Neuroscience, 12, 102.Google Scholar
Ducoffe, M., & Precioso, F. (2018). Adversarial active learning for deep networks: a margin based approach. arXiv:1802.09841Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern Classification and Scene Analysis. New York, NY: Wiley.Google Scholar
Dushkoff, M., & Ptucha, R. (2016). Adaptive activation functions for deep networks. Electronic Imaging, XVI(5), 15.Google Scholar
Elsayed, G. F., Shankar, S., Cheung, B., et al. (2018). Adversarial examples that fool both computer vision and time-limited humans. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 3914–3924. Red Hook, NY: Curran Associates.Google Scholar
Fiori, S. (2000). Blind signal processing by the adaptive activation function neurons. Neural Networks, 13, 597611.Google Scholar
Flennerhag, S., Yin, H., Keane, J., & Elliot, M. (2018). Breaking the activation function bottleneck through adaptive parameterization. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R., (Eds.), Advances in Neural Information Processing Systems 31 (pp. 77397750). New York, NY: Curran Associates.Google Scholar
Fuchs, E., & Flügge, G. (2014). Adult neuroplasticity: more than 40 years of research. Neural Plasticity, 541870, 110.Google Scholar
Fukushima, K. (1975). Cognitron: a self-organizing multilayered neural network. Biological Cybernetics, 20(3–4), 121136.Google Scholar
Fukushima, K. (1980). Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193202.Google Scholar
Fukushima, K. (2019). Recent advances in the deep CNN neocognitron. Nonlinear Theory and Its Applications, IEICE, 10(4), 304321.Google Scholar
Godfrey, L. B. (2019). An evaluation of parametric activation functions for deep learning. In Proceedings of the 2019 IEEE International Conference on Systems, Man and Cybernetics, pp. 30063011.Google Scholar
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al. (2014a). Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., & Weinberger, K. Q., (Eds.), Advances in Neural Information Processing Systems, vol. 27. New York, NY: Curran Associates.Google Scholar
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., et al. (2014b). Generative adversarial nets. In Ghahramani, Z. et al., (Eds.), Advances in Neural Information Processing Systems, 27, 26722680.Google Scholar
Gori, M., & Scarselli, F. (1998). Are multilayer perceptrons adequate for pattern recognition and verification? IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11), 11211132.Google Scholar
Håstad, J. (1987). Computational Limitations of Small-Depth Circuits. Cambridge, MA: MIT Press.Google Scholar
Haykin, S. (1999). Neural Networks: A Comprehensive Foundation. Hoboken, NJ: Prentice Hall.Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In Proceedings of the 2015 IEEE International Conference on Computer Vision, (pp. 1026–1034). IEEE Computer Society, USA.Google Scholar
He, X., Zhao, K., & Chu, X. (2021). Automl: a survey of the state-of-the-art. Knowledge-Based Systems, 212, 106622.Google Scholar
Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York, NY: Wiley.Google Scholar
Hinton, G. E., & Osindero, S. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 2006.Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 17351780.Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 22612269).Google Scholar
Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 148(3), 574.CrossRefGoogle ScholarPubMed
Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106.Google Scholar
Hubel, D. H., & Wiesel, T. N. (1977). Ferrier lecture-functional architecture of macaque monkey visual cortex. Proceedings of the Royal Society of London. Series B. Biological Sciences, 198(1130), 159.Google Scholar
Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man, and Cybernetics, 1(4), 364378.Google Scholar
Ivakhnenko, A. G., & Lapa, V. G. (1965). Cybernetic Predicting Devices. New York, NY: CCM Information Corporation.Google Scholar
Jagtap, A. D., Kawaguchi, K., & Karniadakis, G. E. (2020). Adaptive activation functions accelerate convergence in deep and physics-informed neural networks. Journal of Computational Physics, 404, 109136.CrossRefGoogle Scholar
Kell, A. J., Yamins, D. L., Shook, E. N., Norman-Haignere, S. V., & McDermott, J. H. (2018). A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3), 630644.Google Scholar
Kipf, T. N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations. OpenReview.net.Google Scholar
Klambauer, G., Unterthiner, T., Mayr, A., & Hochreiter, S. (2017). Self-normalizing neural networks. In Guyon, I. et al., (Eds.), Advances in Neural Information Processing Systems 30 (pp. 971980).Google Scholar
Kriegeskorte, N. (2015). Deep neural networks: a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1(1), 417446.Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105).Google Scholar
Kunc, V., & Kléma, J. (2019). On transformative adaptive activation functions in neural networks for gene expression inference. bioRxivGoogle Scholar
LeCun, Y., Boser, B., Denker, J. S., et al. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541551.Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pp. 2278–2324.Google Scholar
Lee, H., & Fu, K. (1974). Grammatical inference for syntactic pattern recognition. In Tou, J., (Ed.), Information Systems (pp. 425449). Boston, MA: Springer.Google Scholar
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 609–616).Google Scholar
LeNail, A. (2019). NN-SVG: publication-ready neural network architecture schematics. The Journal of Open Source Software, 4(33), 747.Google Scholar
Li, D., Chen, X., Becchi, M., & Zong, Z. (2016). Evaluating the energy efficiency of deep convolutional neural networks on cpus and gpus. In the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (pp. 477484).Google Scholar
Lippmann, R. P., & Gold, B. (1987). Neural classifiers useful for speech recognition. In IEEE Proceedings of the First International Conference on Neural Networks, vol. IV (pp. 417422). San Diego, CA.Google Scholar
Liu, B., Yu, X., Yu, A., Zhang, P., Wan, G., & Wang, R. (2018). Deep few-shot learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 57(4), 22902304.Google Scholar
Marra, G., Zanca, D., Betti, A., & Gori, M. (2018). Learning neuron non-linearities with kernel-based deep neural networks. CoRR, abs/1807.06302Google Scholar
Michels, F., Uelwer, T., Upschulte, E., & Harmeling, S. (2019). On the vulnerability of capsule networks to adversarial attacks. arXiv:1906.03612Google Scholar
Minsky, M., & Papert, S. A. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge, MA: MIT Press.Google Scholar
Mozzachiodi, R., & Byrne, J. (2010). More than synaptic plasticity: role of nonsynaptic plasticity in learning and memory. Trends in Neurosciences, 33(1), 1726.Google Scholar
Oléron, P. (1963). Les activités intellectuelles. In P. Oléron, J. Piaget, B. Inhelder, & P. Gréco, , (Eds.), Traité de psychologie expérimentale VII. L’Intelligence (pp. 170). Paris: Presses Universitaires de France.Google Scholar
Oléron, P., Piaget, J., Inhelder, B., & Gréco, P. (1963). Traité de psychologie expérimentale VII. L’Intelligence. Paris: Presses Universitaires de France.Google Scholar
Olson, R. S., Cava, W. G. L., Orzechowski, P., Urbanowicz, R. J., & Moore, J. H. (2017). PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Mining, 10(1), 36:1–36:13.Google Scholar
Paszke, A., Gross, S., Massa, F., et al. (2019). PyTorch: an imperative style, high-performance deep learning library. In Wallach, H. et al. (Eds.), Advances in Neural Information Processing Systems 32, (pp. 80248035). New York, NY: Curran Associates.Google Scholar
Peterson, J. C., Abbott, J. T., & Griffiths, T. L. (2018). Evaluating (and improving) the correspondence between deep neural networks and human representations. Cognitive Science, 42(8), 26482669.Google Scholar
Qian, S., Liu, H., Liu, C., Wu, S., & Wong, H.-S. (2018). Adaptive activation functions in convolutional neural networks. Neurocomputing, 272, 204212.Google Scholar
Roy, S., Unmesh, A., & Namboodiri, V. P. (2018). Deep active learning for object detection. In 29th British Machine Vision Conference (p. 91).Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986a). Learning representations by back-propagating errors. Nature, 323, 533536.Google Scholar
Rumelhart, D. E., McClelland, J. L., & Group, P. R. (1986b). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.Google Scholar
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. In Proceedings of the 31st International Conference on Neural Information Processing Systems (pp. 38593869).Google Scholar
Scardapane, S., Vaerenbergh, S. V., & Uncini, A. (2019). Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Networks, 110, 1932.Google Scholar
Shawahna, A., Sait, S. M., & El-Maleh, A. (2019). FPGA-based accelerators of deep learning networks for learning and classification: a review. IEEE Access, 7, 78237859.Google Scholar
Shen, Y., Dasgupta, S., & Navlakha, S. (2020). Habituation as a neural algorithm for online odor discrimination. Proceedings of the National Academy of Sciences, 117(22), 1240212410.Google Scholar
Siddoway, B., Hou, H., & Xia, H. (2014). Molecular mechanisms of homeostatic synaptic downscaling. Neuropharmacology, 78, 3844.Google Scholar
Siu, K.-Y., Roychowdhury, V., & Kailath, T. (1995). Discrete Neural Networks. Hoboken, NJ: Prentice Hall.Google Scholar
Solazzi, M., & Uncini, A. (2004). Regularising neural networks using flexible multivariate activation function. Neural Networks, 17(2), 247260.Google Scholar
Steinkrau, D., Simard, P. Y., & Buck, I. (2005). Using GPUs for machine learning algorithms. In Proceedings of the 8th International Conference on Document Analysis and Recognition (pp. 11151119). IEEE Computer Society.Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., et al. (2014). Intriguing properties of neural networks. In 2nd International Conference on Learning Representations.Google Scholar
Tanay, T., & Griffin, L. (2016). A boundary tilting perspective on the phenomenon of adversarial examples. arXiv e-prints arXiv–1608Google Scholar
Tramèr, F., Papernot, N., Goodfellow, I., Boneh, D., & McDaniel, P. (2017). The space of transferable adversarial examples. arXiv:1704.03453Google Scholar
Trentin, E. (1998). Learning the amplitude of activation functions in layered networks. In Marinaro, M., & Tagliaferri, R. (Eds.), Neural Nets - WIRN Vietri 98, vol. 7081 of Lecture Notes in Computer Science, (pp. 138–144). Berlin: Springer.Google Scholar
Trentin, E. (2001). Networks with trainable amplitude of activation functions. Neural Networks, 14(4–5), 471493.Google Scholar
Turrigiano, G. G., & Nelson, S. B. (2000). Hebb and homeostasis in neuronal plasticity. Current Opinion in Neurobiology, 10(3), 358364.Google Scholar
Vanschoren, J., van Rijn, J. N., Bischl, B., & Torgo, L. (2013). OpenML: networked science in machine learning. SIGKDD Explorations, 15(2), 4960.Google Scholar
Vecci, L., Piazza, F., & Uncini, A. (1998). Learning and approximation capabilities of adaptive spline activation function neural networks. Neural Networks, 11(2), 259270.Google Scholar
Viroli, C., & Mclachlan, G. J. (2019). Deep Gaussian mixture models. Statistics and Computing, 29(1), 4351.Google Scholar
WardJr., J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236244.Google Scholar
Werbos, P. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science. Ph.D. Thesis, Department of Applied Mathematics, Harvard University.Google Scholar
Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339356.Google Scholar
Wiener, N. (1958). Nonlinear Problems in Random Theory. New York, NY: John Wiley.Google Scholar
Xian, Y., Lampert, C. H., Schiele, B., & Akata, Z. (2018). Zero-shot learning: a comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(9), 22512265.Google Scholar
Xu, B., Wang, N., Chen, T., and Li, M. (2015). Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853v2Google Scholar
Yang, M., Sheth, S. A., Schevon, C. A., McKhann, G. M., & Mesgarani, N. (2015). Speech reconstruction from human auditory cortex with deep neural networks. In Proceedings of INTERSPEECH 2015, ISCA (pp. 1121–1125).Google Scholar
Zhang, L., Xiang, T., & Gong, S. (2017). Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2021–2030).Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×