Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-24T00:53:53.478Z Has data issue: false hasContentIssue false

Psychophysics may be the game-changer for deep neural networks (DNNs) to imitate the human vision

Published online by Cambridge University Press:  06 December 2023

Keerthi S. Chandran
Affiliation:
Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India [email protected] Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India [email protected]
Amrita Mukherjee Paul
Affiliation:
Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India [email protected] Applied Sciences, IIIT Allahabad, Prayagraj, UP, India [email protected]
Avijit Paul
Affiliation:
Biomedical Engineering, Tufts University, Medford, MA, USA [email protected]
Kuntal Ghosh
Affiliation:
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India [email protected]

Abstract

Psychologically faithful deep neural networks (DNNs) could be constructed by training with psychophysics data. Moreover, conventional DNNs are mostly monocular vision based, whereas the human brain relies mainly on binocular vision. DNNs developed as smaller vision agent networks associated with fundamental and less intelligent visual activities, can be combined to simulate more intelligent visual activities done by the biological brain.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

In keeping with what Turing proposed for the imitation game (Turing, Reference Turing1950), a good brain-computational model (Kriegeskorte & Douglas, Reference Kriegeskorte and Douglas2018) would not be the one that performs a particular task with equal or greater accuracy than a human being, but rather the one which would be indistinguishable from a human being vis-à-vis input and output. Psychophysics, interestingly, is also about input and output with the brain as black-box in between (Read, Reference Read2015). Bowers et al. provide a comprehensive presentation of the incongruence between deep neural networks (DNNs) and the visual brain, but fails to note this relevant connection of psychophysics to neuroscience for brain-computational modeling (Read, Reference Read2015).

Psychophysics is “the analysis of perceptual processes by studying the effect on a subject's experience or behavior of systematically varying the properties of a stimulus along one or more physical dimensions” (Bruce, Green, & Georgeson, Reference Bruce, Green and Georgeson2003). The psychophysics stimulus for vision can be an image or video, and DNN, an information-processing system, may model the subject's response to the stimulus using supervised learning. David Marr had proposed that an information processing system should be understood at three levels: computational, algorithmic, and implementation. The psychophysics task describes the computational level problem, a DNN that performs the same task in silica would represent the algorithmic level, and the electrophysiological or fMRI data obtained during the task will be a by-product of the implementation of the algorithm in the biological brain. If the DNN is considered for an equivalent mapping between input and output as in a psychophysics experiment, then the inputs can be represented by a tensor, whether it is an image, video, sound signal, or a spatially invariant visual stimulus like the flicker; the output would also have a numerical representation which, in case of psychophysics experiments, could be some classification, perceived brightness, color, shape, size, motion, intensity at a particular location in the input signal, or a comparison between two of those perceived sensations at different locations of the stimulus, separated by space or time or both. The algorithm used to transform the stimulus input to output will not be evident from psychophysics experiments, but DNNs can construct that algorithm without its exact knowledge for the programmer.

The dataset can be prepared by manipulating physical parameters associated with the stimulus and getting the subject response for each of the stimuli. There can be some subjective differences between the psychophysics data of human subjects for the same stimuli (Read, Reference Read2015). So, it will be a better strategy to train and test a DNN on the psychophysics data of the same subject. Kubota, Hiyama, and Inami (Reference Kubota, Hiyama and Inami2021) have used psychophysics data obtained from brightness illusions to train DNNs. Kubota et al. (Reference Kubota, Hiyama and Inami2021) have shown that it is possible to make comparisons between human perception on the one hand, and the output with the said methodology, on the other. DNNs may also be tested on a stimulus, completely different from the one it was trained on, if its output layer is of similar representation to that of the new stimulus. Recently, Ghosh and Chandran (Reference Ghosh and Chandran2021) proposed such a technique for flicker stimulus. The intermediate outputs of a DNN can be compared with the brain electrophysiological signals as done by Zipser and Andersen (Reference Zipser and Andersen1988), and more recently by Chandran and Ghosh (Reference Chandran and Ghosh2021, Reference Chandran and Ghosh2022) with EEG. We argue that more testable models can be constructed by training on less computationally intensive tasks than tasks like object classification into thousands of classes. For instance, a convolutional neural network (CNN) trained for low-level visual tasks gets deceived by brightness and color illusions (Gomez-Villa, Martín, Vazquez-Corral, Bertalmío, & Malo, Reference Gomez-Villa, Martín, Vazquez-Corral, Bertalmío and Malo2020). DNNs have also been put forth to solve tasks used in experimental psychology like Raven's progressive matrices (Jahrens & Martinetz, Reference Jahrens and Martinetz2020). New network models, different from the engineering goal-oriented image classification DNNs, could be constructed for the purpose as was previously done for finding head-centered coordinates of external objects by monkey brain by Zipser and Andersen (Reference Zipser and Andersen1988). It could be easier to make correlations between outputs of intermediate layers of a neural network with fewer neurons and layers with brain signals than complex networks.

Bowers et al. mentions that DNNs trained on ImageNet do not encode three-dimensional (3D) features of objects or their depth as opposed to human vision. The abovementioned DNNs are trained with datasets prepared from cameras with monocular vision. But the mammalian brain gets information from the two eyes and it is known that human subjects with one eye are not so efficient with depth perception (Westlake, Reference Westlake2001). Robots with stereo cameras making use of DNNs are able to do tasks like calculating position of detected fruit from stereo cameras (Onishi et al., Reference Onishi, Yoshida, Kurita, Fukao, Arihara and Iwai2019). Stereo vision can enable autonomous driving vehicles to do tasks like object detection, 3D information acquisition, and depth perception (Fan, Wang, Junaid Bocus, & Pitas, Reference Fan, Wang, Junaid Bocus and Pitas2023). The mammalian brain had input from two eyes throughout the course of its evolutionary history. So training DNNs using stereo camera data might be needed to develop the equivalents of many circuits in the brain.

To conclude, psychophysics with DNNs could be used to construct many of the smaller agents that compose the human mind as proposed by Minsky (Reference Minsky1988). Vision agents that compose the mind need to be likewise constructed via DNNs, which may be associated with fundamental activities like brightness perception, motion detection, depth perception, or even less intelligent activities than that, in the parallel visual pathways. Neural networks for more complex tasks can be built with a combination of smaller DNNs using shared layers, or by using output from some layers of a DNN as input for layers of another DNN.

Financial support

This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.

Competing interest

None.

References

Bruce, V., Green, P. R., & Georgeson, M. A. (2003). Visual perception: Physiology, psychology, & ecology. Psychology Press.Google Scholar
Chandran, K. S., & Ghosh, K. (2021). Recurrent convolutional neural networks trained by psychophysics data can predict EEG response to flicker. Perception, 50(ECVP2021 Supplement), 1244. https://doi.org/10.1177/03010066211059887Google Scholar
Chandran, K. S., & Ghosh, K. (2022). An in-silica computation of alpha oscillations from apparently unrelated psychophysics data. https://doi.org/10.21203/rs.3.rs-1862596/v1CrossRefGoogle Scholar
Fan, R., Wang, L., Junaid Bocus, M., & Pitas, I. (2023). Computer stereo vision for autonomous driving: Theory and algorithms. Studies in Computational Intelligence, 4170. https://doi.org/10.1007/978-3-031-18735-3_3CrossRefGoogle Scholar
Ghosh, K., & Chandran, K. S. (2021). A low-cost device and technique for generating big data in visual psychophysics to train brain models. Perception, 50(ECVP2021 Supplement), 1244. https://doi.org/10.1177/03010066211059887Google Scholar
Gomez-Villa, A., Martín, A., Vazquez-Corral, J., Bertalmío, M., & Malo, J. (2020). Color illusions also deceive CNNs for low-level vision tasks: Analysis and implications. Vision Research, 176, 156174. https://doi.org/10.1016/j.visres.2020.07.010CrossRefGoogle ScholarPubMed
Jahrens, M., & Martinetz, T. (2020). Solving Raven's progressive matrices with multi-layer relation networks. In 2020 International joint conference on neural networks (IJCNN). Jointly organized by the IEEE Computational Intelligence Society (CIS) and the International Neural Network Society (INNS), Glasgow, UK (pp. 1-6). https://doi.org/10.1109/ijcnn48605.2020.9207319CrossRefGoogle Scholar
Kriegeskorte, N., & Douglas, P. K. (2018). Cognitive computational neuroscience. Nature Neuroscience, 21(9), 11481160. https://doi.org/10.1038/s41593-018-0210-5CrossRefGoogle ScholarPubMed
Kubota, Y., Hiyama, A., & Inami, M. (2021). A machine learning model perceiving brightness optical illusions: Quantitative evaluation with psychophysical data. In Proceedings of the Augmented Humans International Conference 2021 (AHs '21). Association for Computing Machinery, New York, NY, USA (pp. 174–182). https://doi.org/10.1145/3458709.3458952CrossRefGoogle Scholar
Minsky, M. (1988). Prologue. In The society of mind (p. 17). Simon & Schuster.Google Scholar
Onishi, Y., Yoshida, T., Kurita, H., Fukao, T., Arihara, H., & Iwai, A. (2019). An automated fruit harvesting robot by using deep learning. ROBOMECH Journal, 6(1), 13. https://doi.org/10.1186/s40648-019-0141-2CrossRefGoogle Scholar
Read, J. C. A. (2015). The place of human psychophysics in modern neuroscience. Neuroscience, 296, 116129. https://doi.org/10.1016/j.neuroscience.2014.05.036CrossRefGoogle ScholarPubMed
Turing, A. M. (1950). I. – Computing machinery and intelligence. Mind; A Quarterly Review of Psychology and Philosophy, LIX(236), 433460. https://doi.org/10.1093/mind/lix.236.433CrossRefGoogle Scholar
Westlake, W. (2001). Is a one eyed racing driver safe to compete? Formula one (eye) or two? British Journal of Ophthalmology, 85(5), 619624. https://doi.org/10.1136/bjo.85.5.619CrossRefGoogle ScholarPubMed
Zipser, D., & Andersen, R. A. (1988). A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature, 331(6158), 679684. https://doi.org/10.1038/331679a0CrossRefGoogle ScholarPubMed