Hostname: page-component-745bb68f8f-s22k5 Total loading time: 0 Render date: 2025-01-11T21:26:58.576Z Has data issue: false hasContentIssue false

Models of vision need some action

Published online by Cambridge University Press:  06 December 2023

Constantin Rothkopf
Affiliation:
Centre for Cognitive Science, Technical University of Darmstadt, Darmstadt, Germany [email protected] Frankfurt Institute for Advanced Studies, Goethe-Universität Frankfurt, Frankfurt am Main, Germany [email protected] Center for Mind, Brain and Behavior, University of Marburg and Justus Liebig University Giessen, Giessen, Germany HMWK-Clusterproject The Adaptive Mind, Hesse, Germany https://www.theadaptivemind.de/
Frank Bremmer
Affiliation:
Center for Mind, Brain and Behavior, University of Marburg and Justus Liebig University Giessen, Giessen, Germany HMWK-Clusterproject The Adaptive Mind, Hesse, Germany https://www.theadaptivemind.de/ Applied Physics and Neurophysics, University of Marburg, Marburg, Germany [email protected]
Katja Fiehler
Affiliation:
Center for Mind, Brain and Behavior, University of Marburg and Justus Liebig University Giessen, Giessen, Germany HMWK-Clusterproject The Adaptive Mind, Hesse, Germany https://www.theadaptivemind.de/ Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany [email protected] [email protected]
Katharina Dobs
Affiliation:
Center for Mind, Brain and Behavior, University of Marburg and Justus Liebig University Giessen, Giessen, Germany HMWK-Clusterproject The Adaptive Mind, Hesse, Germany https://www.theadaptivemind.de/ Experimental Psychology, Justus Liebig University Giessen, Giessen, Germany [email protected] [email protected]
Jochen Triesch
Affiliation:
Frankfurt Institute for Advanced Studies, Goethe-Universität Frankfurt, Frankfurt am Main, Germany [email protected] Center for Mind, Brain and Behavior, University of Marburg and Justus Liebig University Giessen, Giessen, Germany HMWK-Clusterproject The Adaptive Mind, Hesse, Germany https://www.theadaptivemind.de/

Abstract

Bowers et al. focus their criticisms on research that compares behavioral and brain data from the ventral stream with a class of deep neural networks for object recognition. While they are right to identify issues with current benchmarking research programs, they overlook a much more fundamental limitation of this literature: Disregarding the importance of action and interaction for perception.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Araslanov, N., Rothkopf, C. A., & Roth, S. (2019). Actor-critic instance segmentation. In L. Davis, P. Torr, & S.-Z. Zhu (Eds.), Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, California, 16–20 June 2019 (pp. 8237–8246).CrossRefGoogle Scholar
Ayzenberg, V., & Behrmann, M. (2023). The where, what, and how of object recognition. Trends in Cognitive Sciences, 27, 335336.CrossRefGoogle ScholarPubMed
Bremmer, F., Churan, J., & Lappe, M. (2017). Heading representations in primates are compressed by saccades. Nature Communications, 8, 920.CrossRefGoogle ScholarPubMed
Bremmer, F., & Krekelberg, B. (2003). Seeing and acting at the same time: Challenges for brain (and) research. Neuron, 38, 367370.CrossRefGoogle ScholarPubMed
Dobs, K., Bülthoff, I., & Schultz, J. (2018). Use and usefulness of dynamic face stimuli for face perception studies – A review of behavioral findings and methodology. Frontiers in Psychology, 9, 1355.CrossRefGoogle ScholarPubMed
Dwivedi, K., Bonner, M. F., Cichy, R. M., & Roig, G. (2021). Unveiling functions of the visual cortex using task-specific deep neural networks. PLoS Computational Biology, 17(8), e1009267.CrossRefGoogle ScholarPubMed
Eckmann, S., Klimmasch, L., Shi, B. E., & Triesch, J. (2020). Active efficient coding explains the development of binocular vision and its failure in amblyopia. Proceedings of the National Academy of Sciences of the United States of America, 117(11), 61566162.CrossRefGoogle ScholarPubMed
Fiehler, K., Brenner, E., & Spering, M. (2019). Prediction in goal-directed action. Journal of Vision, 19(9), 10, 1–21.CrossRefGoogle ScholarPubMed
Fiehler, K., & Karimpur, H. (2023). Spatial coding for action across spatial scales. Nature Reviews Psychology, 2, 7284.CrossRefGoogle Scholar
Jiahui, G., Feilong, M., di Oleggio Castello, M. V., Nastase, S. A., Haxby, J. V., & Gobbini, M. I. (2022). Modeling naturalistic face processing in humans with deep convolutional neural networks. bioRxiv, 139.Google Scholar
Kessler, F., Frankenstein, J., & Rothkopf, C. A. (2022). A dynamic Bayesian actor model explains endpoint variability in homing tasks. bioRxiv, 125.Google Scholar
Krakauer, J. W., Ghazanfar, A. A., Gomez-Marin, A., MacIver, M. A., & Poeppel, D. (2017). Neuroscience needs behavior: Correcting a reductionist bias. Neuron, 93, 480490.CrossRefGoogle ScholarPubMed
Mineault, P. J., Bakhtiari, S., Richards, B. A., & Pack, C. C. (2021). Your head is there to move you around: Goal-driven models of the primate dorsal pathway. Advances in Neural Information Processing Systems, 34, 2875728771.Google Scholar
Orhan, E., Gupta, V., & Lake, B. M. (2020). Self-supervised learning through the eyes of a child. Advances in Neural Information Processing Systems, 33, 99609971.Google Scholar
Roelfsema, P. R., & Holtmaat, A. (2018). Control of synaptic plasticity in deep cortical networks. Nature Reviews Neuroscience, 19, 166180.CrossRefGoogle ScholarPubMed
Rothkopf, C. A., Weisswange, T. H., & Triesch, J. (2009). Learning independent causes in natural images explains the space variant oblique effect. In M. Amine, N. Enayati, & H. Li (Eds.), 2009 IEEE 8th international conference on development and learning, Shanghai, China, 5–7 June 2009 (pp. 1–6). IEEE.Google Scholar
Schmitt, C., Schwenk, J. C. B., Schütz, A., Churan, J., Kaminiarz, A., & Bremmer, F. (2021). Preattentive processing of visually guided self-motion in humans and monkeys. Progress in Neurobiology, 205, 102117.CrossRefGoogle ScholarPubMed
Schneider, F., Xu, X., Ernst, M. R., Yu, Z., & Triesch, J. (2021). Contrastive learning through time. In SVRHM 2021 .Google Scholar
Straub, D., & Rothkopf, C. A. (2022). Putting perception into action with inverse optimal control for continuous psychophysics. eLife, 11, 76635.CrossRefGoogle ScholarPubMed
Wang, Z., Liu, L., Duan, Y., Kong, Y., & Tao, D. (2022). Continual learning with lifelong vision transformer. In R. Chellappa, J. Matas, L. Quan, & M. Shah (Eds.), Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, New Orleans, Louisiana, 19–24 June 2022 (pp. 171–181).CrossRefGoogle Scholar
Xu, X., & Triesch, J. (2023). CIPER: Combining invariant and equivariant representations using contrastive and predictive learning. http://arxiv.org/abs/2302.02330CrossRefGoogle Scholar
Zhuang, C., Yan, S., Nayebi, A., Schrimpf, M., Frank, M. C., DiCarlo, J. J., & Yamins, D. L. (2021). Unsupervised neural network models of the ventral visual stream. Proceedings of the National Academy of Sciences of the United States of America, 118(3), e2014196118.CrossRefGoogle ScholarPubMed