Fixing the problems of deep neural networks will require better training data and learning algorithms

Drew Linsley; Thomas Serre

doi:10.1017/S0140525X23001589

Fixing the problems of deep neural networks will require better training data and learning algorithms

Published online by Cambridge University Press: 06 December 2023

Drew Linsley and

Thomas Serre

Show author details

Drew Linsley: Affiliation:
Department of Cognitive Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA [email protected] [email protected] https://sites.brown.edu/drewlinsley https://serre-lab.clps.brown.edu
Thomas Serre: Affiliation:
Department of Cognitive Linguistic & Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA [email protected] [email protected] https://sites.brown.edu/drewlinsley https://serre-lab.clps.brown.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Bowers et al. argue that deep neural networks (DNNs) are poor models of biological vision because they often learn to rival human accuracy by relying on strategies that differ markedly from those of humans. We show that this problem is worsening as DNNs are becoming larger-scale and increasingly more accurate, and prescribe methods for building DNNs that can reliably model biological vision.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e400

DOI: https://doi.org/10.1017/S0140525X23001589 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRef Google Scholar

Bakhtiari, S., Mineault, P., Lillicrap, T., Pack, C., & Richards, B. (2021). The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. Advances in Neural Information Processing Systems, 34, 25164–25178.Google Scholar

Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D., & DiCarlo, J. J. (2020). Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 13073–13087). Curran.Google Scholar

Fel, T., Felipe, I., Linsley, D., & Serre, T. (2022). Harmonizing the object recognition strategies of deep neural networks with humans. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., & Oh, A. (Eds.), Advances in neural information processing systems (Vol. 35, pp. 9432–9446). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/3d681cc4487b97c08e5aa67224dd74f2-Paper-Conference.pdf Google Scholar

Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.CrossRef Google Scholar

Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Wortman Vaughan, J. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 23885–23899). Curran.Google Scholar

Kim, J., Linsley, D., Thakkar, K., & Serre, T. (2020). Disentangling neural mechanisms for perceptual grouping. In Z. Chen, J. Zhang, M. Arjovsky, & L. Bottou (Eds.), International Conference on Learning Representations, Addis Abada, Ethopia.Google Scholar

Kim, J., Ricci, M., & Serre, T. (2018). Not-So-CLEVR: Learning same-different relations strains feedforward neural networks. Interface Focus, 8(4), 20180011.CrossRef Google Scholar PubMed

Kubilius, J., Schrimpf, M., Nayebi, A., Bear, D., Yamins, D. L. K., & DiCarlo, J. J. (2018). CORnet: Modeling the neural mechanisms of core object recognition. bioRxiv, 408385. https://doi.org/10.1101/408385Google Scholar

Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? https://openreview.net › forumhttps://openreview.net › forum. https://openreview.net/pdf?id=qrGKGZZvH0 Google Scholar

Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews. Neuroscience, 21(6), 335–346.CrossRef Google Scholar PubMed

Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). What are the visual features underlying human versus machine vision? In Y. Song, C. Ma, L. Gong, J. Zhang, R. W. H. Lau, & M. Yang (Eds.), IEEE international conference on computer vision workshops, Venice, Italy (pp. 2706–2714).CrossRef Google Scholar

Linsley, D., Kim, J., Ashok, A., & Serre, T. (2019a). Recurrent neural circuits for contour detection. International conference on representation learning. https://openreview.net/forum?id=H1gB4RVKvB&noteId=H1gB4RVKvB Google Scholar

Linsley, D., Kim, J., Veerabadran, V., Windolf, C., & Serre, T. (2018). Learning long-range spatial dependencies with horizontal gated recurrent units. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R. (Eds.), Advances in neural information processing systems (Vol. 31, pp. 152–164). Curran.Google Scholar

Linsley, D., Malik, G., Kim, J., Govindarajan, L. N., Mingolla, E., & Serre, T. (2021). Tracking without re-recognition in humans and machines. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 19473–19486). Curran.Google Scholar

Linsley, D., Shiebler, D., Eberhardt, S., & Serre, T. (2019). Learning what and where to attend. In I. Loshchilov & F. Hutter (Eds.), 7th International conference on representation learning, New Orleans.Google Scholar

Lotter, W., Kreiman, G., & Cox, D. (2016). Deep predictive coding networks for video prediction and unsupervised learning. arXiv [cs.LG]. http://arxiv.org/abs/1605.08104 Google Scholar

Malhotra, G., Dujmović, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modeling visual object recognition. PLoS Computational Biology, 18(5), e1009572.CrossRef Google Scholar PubMed

Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68.CrossRef Google Scholar PubMed

Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. arXiv [cs.CV]. http://arxiv.org/abs/2003.08934 Google Scholar

Mineault, P., Bakhtiari, S., Richards, B., & Pack, C. (2021). Your head is there to move you around: Goal-driven models of the primate dorsal pathway. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 28757–28771). Curran.Google Scholar

Nayebi, A., Bear, D., Kubilius, J., Kar, K., Ganguli, S., Sussillo, D., … Yamins, D. L. K. (2018). Task-driven convolutional recurrent models of the visual system. arXiv [q-bio.NC]. http://arxiv.org/abs/1807.00053 Google Scholar

Orhan, E., Gupta, V., & Lake, B. M. (2020). Self-supervised learning through the eyes of a child. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 9960–9971). Curran.Google Scholar

Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770.CrossRef Google Scholar PubMed

Smith, L. B., & Slone, L. K. (2017). A developmental approach to machine learning? Frontiers in Psychology, 8, 2124.CrossRef Google Scholar PubMed

Sullivan, J., Mei, M., Perfors, A., Wojcik, E., & Frank, M. C. (2021). SAYCam: A large, longitudinal audiovisual dataset recorded from the infant's perspective. Open Mind: Discoveries in Cognitive Science, 5, 20–29.CrossRef Google Scholar PubMed

Vaishnav, M., Cadene, R., Alamia, A., Linsley, D., VanRullen, R., & Serre, T. (2022). Understanding the computational demands underlying visual reasoning. Neural Computation, 34(5), 1075–1099.CrossRef Google Scholar PubMed

Vaishnav, M., & Serre, T. (2023). GAMR: A guided attention model for (visual) reasoning. International conference on learning representations. https://openreview.net/pdf?id=iLMgk2IGNyv Google Scholar

Wiskott, L., & Sejnowski, T. J. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715–770.CrossRef Google Scholar PubMed

Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–8624.CrossRef Google Scholar PubMed

Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. K. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118. https://doi.org/10.1073/pnas.2014196118Google Scholar PubMed