No CrossRef data available.
Article contents
Fixing the problems of deep neural networks will require better training data and learning algorithms
Published online by Cambridge University Press: 06 December 2023
Abstract
Bowers et al. argue that deep neural networks (DNNs) are poor models of biological vision because they often learn to rival human accuracy by relying on strategies that differ markedly from those of humans. We show that this problem is worsening as DNNs are becoming larger-scale and increasingly more accurate, and prescribe methods for building DNNs that can reliably model biological vision.
- Type
- Open Peer Commentary
- Information
- Copyright
- Copyright © The Author(s), 2023. Published by Cambridge University Press
References
Baker, N., Lu, H., Erlikhman, G., & Kellman, P. J. (2018). Deep convolutional networks do not classify based on global object shape. PLoS Computational Biology, 14(12), e1006613.CrossRefGoogle Scholar
Bakhtiari, S., Mineault, P., Lillicrap, T., Pack, C., & Richards, B. (2021). The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. Advances in Neural Information Processing Systems, 34, 25164–25178.Google Scholar
Dapello, J., Marques, T., Schrimpf, M., Geiger, F., Cox, D., & DiCarlo, J. J. (2020). Simulating a primary visual cortex at the front of CNNs improves robustness to image perturbations. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 13073–13087). Curran.Google Scholar
Fel, T., Felipe, I., Linsley, D., & Serre, T. (2022). Harmonizing the object recognition strategies of deep neural networks with humans. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., & Oh, A. (Eds.), Advances in neural information processing systems (Vol. 35, pp. 9432–9446). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2022/file/3d681cc4487b97c08e5aa67224dd74f2-Paper-Conference.pdfGoogle Scholar
Geirhos, R., Jacobsen, J.-H., Michaelis, C., Zemel, R., Brendel, W., Bethge, M., & Wichmann, F. A. (2020). Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11), 665–673.CrossRefGoogle Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Wortman Vaughan, J. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 23885–23899). Curran.Google Scholar
Kim, J., Linsley, D., Thakkar, K., & Serre, T. (2020). Disentangling neural mechanisms for perceptual grouping. In Z. Chen, J. Zhang, M. Arjovsky, & L. Bottou (Eds.), International Conference on Learning Representations, Addis Abada, Ethopia.Google Scholar
Kim, J., Ricci, M., & Serre, T. (2018). Not-So-CLEVR: Learning same-different relations strains feedforward neural networks. Interface Focus, 8(4), 20180011.CrossRefGoogle ScholarPubMed
Kubilius, J., Schrimpf, M., Nayebi, A., Bear, D., Yamins, D. L. K., & DiCarlo, J. J. (2018). CORnet: Modeling the neural mechanisms of core object recognition. bioRxiv, 408385. https://doi.org/10.1101/408385Google Scholar
Kumar, M., Houlsby, N., Kalchbrenner, N., & Cubuk, E. D. (2022). Do better ImageNet classifiers assess perceptual similarity better? https://openreview.net › forumhttps://openreview.net › forum. https://openreview.net/pdf?id=qrGKGZZvH0Google Scholar
Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews. Neuroscience, 21(6), 335–346.CrossRefGoogle ScholarPubMed
Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). What are the visual features underlying human versus machine vision? In Y. Song, C. Ma, L. Gong, J. Zhang, R. W. H. Lau, & M. Yang (Eds.), IEEE international conference on computer vision workshops, Venice, Italy (pp. 2706–2714).CrossRefGoogle Scholar
Linsley, D., Kim, J., Ashok, A., & Serre, T. (2019a). Recurrent neural circuits for contour detection. International conference on representation learning. https://openreview.net/forum?id=H1gB4RVKvB¬eId=H1gB4RVKvBGoogle Scholar
Linsley, D., Kim, J., Veerabadran, V., Windolf, C., & Serre, T. (2018). Learning long-range spatial dependencies with horizontal gated recurrent units. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R. (Eds.), Advances in neural information processing systems (Vol. 31, pp. 152–164). Curran.Google Scholar
Linsley, D., Malik, G., Kim, J., Govindarajan, L. N., Mingolla, E., & Serre, T. (2021). Tracking without re-recognition in humans and machines. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 19473–19486). Curran.Google Scholar
Linsley, D., Shiebler, D., Eberhardt, S., & Serre, T. (2019). Learning what and where to attend. In I. Loshchilov & F. Hutter (Eds.), 7th International conference on representation learning, New Orleans.Google Scholar
Lotter, W., Kreiman, G., & Cox, D. (2016). Deep predictive coding networks for video prediction and unsupervised learning. arXiv [cs.LG]. http://arxiv.org/abs/1605.08104Google Scholar
Malhotra, G., Dujmović, M., & Bowers, J. S. (2022). Feature blindness: A challenge for understanding and modeling visual object recognition. PLoS Computational Biology, 18(5), e1009572.CrossRefGoogle ScholarPubMed
Malhotra, G., Evans, B. D., & Bowers, J. S. (2020). Hiding a plane with a pixel: Examining shape-bias in CNNs and the benefit of building in biological constraints. Vision Research, 174, 57–68.CrossRefGoogle ScholarPubMed
Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing scenes as neural radiance fields for view synthesis. arXiv [cs.CV]. http://arxiv.org/abs/2003.08934Google Scholar
Mineault, P., Bakhtiari, S., Richards, B., & Pack, C. (2021). Your head is there to move you around: Goal-driven models of the primate dorsal pathway. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P. S., & Vaughan, J. W. (Eds.), Advances in neural information processing systems (Vol. 34, pp. 28757–28771). Curran.Google Scholar
Nayebi, A., Bear, D., Kubilius, J., Kar, K., Ganguli, S., Sussillo, D., … Yamins, D. L. K. (2018). Task-driven convolutional recurrent models of the visual system. arXiv [q-bio.NC]. http://arxiv.org/abs/1807.00053Google Scholar
Orhan, E., Gupta, V., & Lake, B. M. (2020). Self-supervised learning through the eyes of a child. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., & Lin, H. (Eds.), Advances in neural information processing systems (Vol. 33, pp. 9960–9971). Curran.Google Scholar
Richards, B. A., Lillicrap, T. P., Beaudoin, P., Bengio, Y., Bogacz, R., Christensen, A., … Kording, K. P. (2019). A deep learning framework for neuroscience. Nature Neuroscience, 22(11), 1761–1770.CrossRefGoogle ScholarPubMed
Smith, L. B., & Slone, L. K. (2017). A developmental approach to machine learning? Frontiers in Psychology, 8, 2124.CrossRefGoogle ScholarPubMed
Sullivan, J., Mei, M., Perfors, A., Wojcik, E., & Frank, M. C. (2021). SAYCam: A large, longitudinal audiovisual dataset recorded from the infant's perspective. Open Mind: Discoveries in Cognitive Science, 5, 20–29.CrossRefGoogle ScholarPubMed
Vaishnav, M., Cadene, R., Alamia, A., Linsley, D., VanRullen, R., & Serre, T. (2022). Understanding the computational demands underlying visual reasoning. Neural Computation, 34(5), 1075–1099.CrossRefGoogle ScholarPubMed
Vaishnav, M., & Serre, T. (2023). GAMR: A guided attention model for (visual) reasoning. International conference on learning representations. https://openreview.net/pdf?id=iLMgk2IGNyvGoogle Scholar
Wiskott, L., & Sejnowski, T. J. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715–770.CrossRefGoogle ScholarPubMed
Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111(23), 8619–8624.CrossRefGoogle ScholarPubMed
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. K. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118. https://doi.org/10.1073/pnas.2014196118Google ScholarPubMed
Target article
Deep problems with neural network models of human vision
Related commentaries (29)
Explananda and explanantia in deep neural network models of neurological network functions
A deep new look at color
Beyond the limitations of any imaginable mechanism: Large language models and psycholinguistics
Comprehensive assessment methods are key to progress in deep learning
Deep neural networks are not a single hypothesis but a language for expressing computational hypotheses
Even deeper problems with neural network models of language
Fixing the problems of deep neural networks will require better training data and learning algorithms
For deep networks, the whole equals the sum of the parts
For human-like models, train on human-like tasks
Going after the bigger picture: Using high-capacity models to understand mind and brain
Implications of capacity-limited, generative models for human vision
Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision
Modelling human vision needs to account for subjective experience
Models of vision need some action
My pet pig won't fly and I want a refund
Neither hype nor gloom do DNNs justice
Neural networks need real-world behavior
Neural networks, AI, and the goals of modeling
Perceptual learning in humans: An active, top-down-guided process
Psychophysics may be the game-changer for deep neural networks (DNNs) to imitate the human vision
Statistical prediction alone cannot identify good models of behavior
The model-resistant richness of human visual experience
The scientific value of explanation and prediction
There is a fundamental, unbridgeable gap between DNNs and the visual cortex
Thinking beyond the ventral stream: Comment on Bowers et al.
Using DNNs to understand the primate vision: A shortcut or a distraction?
Where do the hypotheses come from? Data-driven learning in science and the brain
Why psychologists should embrace rather than abandon DNNs
You can't play 20 questions with nature and win redux
Author response
Clarifying status of DNNs as models of human vision