Hostname: page-component-78c5997874-8bhkd Total loading time: 0 Render date: 2024-11-06T04:08:22.877Z Has data issue: false hasContentIssue false

Modelling human vision needs to account for subjective experience

Published online by Cambridge University Press:  06 December 2023

Marcin Koculak
Affiliation:
Centre for Brain Research, Jagiellonian University, Krakow, Poland [email protected] Consciousness Lab, Institute of Psychology, Jagiellonian University, Krakow, Poland [email protected] https://c-lab.pl
Michał Wierzchoń
Affiliation:
Centre for Brain Research, Jagiellonian University, Krakow, Poland [email protected] Consciousness Lab, Institute of Psychology, Jagiellonian University, Krakow, Poland [email protected] https://c-lab.pl

Abstract

Vision is inseparably connected to perceptual awareness which can be seen as the culmination of sensory processing. Studies on conscious vision reveal that object recognition is just one of the means through which our representation of the world is built. We propose an operationalization of subjective experience in the context of deep neural networks (DNNs) that could encourage a more thorough comparison of human and artificial vision.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

The target article comprehensibly deconstructs common misconceptions, such as models of human vision that can be reduced to mechanisms of object recognition or that useful analogies between neuronal and artificial architectures can be drawn solely from accuracy scores and their correlations with brain activity. We fully agree that such oversimplifications need to be avoided if deep neural networks (DNNs) are to be considered accurate models of vision. Troubles stemming from similar oversimplifications are well-known in consciousness research. One of the main obstacles for the field is the separation of mechanisms that process visual information from those that transform it into the conscious activity of seeing. Here, we offer a high-level outlook on the human vision from this perspective. We believe it could serve as a guiding principle for building more ecologically valid artificial models. It would also lead to better testing criteria for assessing the similarities and differences between humans and DNNs that go beyond object recognition.

When presented with an object, it appears that we first see it in all of its details and only then recognize it. However, experimental evidence suggests that, under carefully controlled conditions, individuals can correctly categorize objects while denying seeing them (Lamme, Reference Lamme2020). The discrepancy between objective performance (i.e., correct categorization) and subjective experience of seeing convincingly illustrates the presence of unconscious processing of perceptual information (Mudrik & Deouell, Reference Mudrik and Deouell2022). It also highlights that categorization may refer to different neural processes depending on the type of object. Identification of faces is a common example of fast automatic processing of a complex set of features that allows us to easily recognize each other. It also demonstrates problems with taking brain activity as an indicator of successful perception. The fusiform gyrus is selectively activated when participants are presented with images of faces (Fahrenfort et al., Reference Fahrenfort, Snijders, Heinen, van Gaal, Scholte and Lamme2012; Haxby, Hoffman, & Gobbini, Reference Haxby, Hoffman and Gobbini2000). However, this activation can be found even if the participant reports no perception (Axelrod, Bar, & Rees, Reference Axelrod, Bar and Rees2015). Similar specific neural activations can be observed in response to other complex stimuli (e.g., one's name) during sleep (Andrillon & Kouider, Reference Andrillon and Kouider2020). Therefore, while behavioural responses and brain activity can provide insights into the extent of processing evoked by certain stimuli, they do not equate to conscious vision.

Feature extraction and object categorization are not the only visual processes that can occur without consciousness. There is evidence of interactions between already differentiated objects that alter each other neural responses when placed closely in the visual field (Lamme, Reference Lamme2020). This includes illusions like the Kanizsa triangle, which requires the integration of multiple objects (Wang, Weng, & He, Reference Wang, Weng and He2012). However, these processes seem to be restricted to local features and are not present when processing requires information integration from larger parts of the visual scene. This is precisely the moment when conscious perception starts to play a role, enabling the organization of distinct elements in the visual field into a coherent scene (e.g., figure-ground differentiation; Lamme, Zipser, & Spekreijse, Reference Lamme, Zipser and Spekreijse2002). Experimental evidence suggests that conscious vision allows for better integration of spatially or temporally distributed information, as well as higher precision of the visual representations (Ludwig, Reference Ludwig2023). A coherent scene can then be used to guide adequate actions and predict future events. From this perspective, while object recognition is an essential part of the visual processing pipeline, it cannot fulfil the representational function of vision alone.

Another notion that complicates comparisons between humans and DNNs is temporal integration. Our perception is trained from birth on continuous perceptual input that is highly temporally correlated. Scenes are not a part of a randomized stream of unrelated snapshots. Temporal integration enables our visual system to augment the processing of stimuli with information extracted from the immediate past. This type of information can involve, for example, changes in the relative position of individuals or objects. Subsequently, this leads to one of the crucial discrepancies between human and artificial vision (the target article identifies aspects of it in sect. 4.1.1–4.1.7). DNNs are built to classify ensembles of pixels in a digital image, while human brains interpret everything as two-dimensional (2D) projections of three-dimensional (3D) objects. This fact imposes restrictions on possible interpretations of perceptual stimuli (which can lead to mistakes) but ultimately allows the visual system to not rely solely on immediate physical stimulation. This in turn makes perception more stable and useful in the context of interactions with the environment. These processes may occur without human-like consciousness. However, consciousness seems to increase the temporal integration of stimuli, strongly shaping the outcome of visual processing.

In this commentary, we aimed to justify why consciousness should be taken into account while modelling human vision with DNNs. Similar inspirations from cognitive science have proven very successful in the recent past in the case of attention (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez and Polosukhin2017) and some researchers already proposed consciousness-like mechanisms (Bengio, Reference Bengio2019). However, even in healthy humans, reliable measurement of consciousness is difficult both theoretically (Seth, Dienes, Cleeremans, Overgaard, & Pessoa, Reference Seth, Dienes, Cleeremans, Overgaard and Pessoa2008) and methodologically (Wierzchoń, Paulewicz, Asanowicz, Timmermans, & Cleeremans, Reference Wierzchoń, Paulewicz, Asanowicz, Timmermans and Cleeremans2014). The task is even more challenging if one would attempt to implement such measurement in artificial neural networks (Timmermans, Schilbach, Pasquali, & Cleeremans, Reference Timmermans, Schilbach, Pasquali and Cleeremans2012). Nevertheless, probing the capabilities of DNNs in realizing functions connected to conscious vision might prove necessary for comparison between DNNs and humans. To make such a comparison more feasible, we would like to propose a rudimentary operationalization of subjective experience as “context dependence.” In the case of visual perception, context can be defined very broadly as all the spatially or temporally distant elements of a visual scene that alter its processing. It also suggests that the global integration of perceptual features is a good approximation of the unifying function of conscious vision. Interestingly, we note that most of the phenomena mentioned in sect. 4.2 of the target article can be reformulated as examples of some form of context dependence, making this overarching principle easy to convey. Showing that DNNs are similar to humans, that is, are selectively susceptible to illusions, alter categorization based on other objects in the scene, or demonstrate object invariance, would be a strong argument in favour of the functional similarity.

Competing interest

None.

References

Andrillon, T., & Kouider, S. (2020). The vigilant sleeper: Neural mechanisms of sensory (de)coupling during sleep. Current Opinion in Physiology, 15, 4759. https://doi.org/10.1016/j.cophys.2019.12.002CrossRefGoogle Scholar
Axelrod, V., Bar, M., & Rees, G. (2015). Exploring the unconscious using faces. Trends in Cognitive Sciences, 19(1), 3545. https://doi.org/10.1016/j.tics.2014.11.003CrossRefGoogle ScholarPubMed
Bengio, Y. (2019). The consciousness prior. arXiv, arXiv:1709.08568. http://arxiv.org/abs/1709.08568Google Scholar
Fahrenfort, J. J., Snijders, T. M., Heinen, K., van Gaal, S., Scholte, H. S., & Lamme, V. A. F. (2012). Neuronal integration in visual cortex elevates face category tuning to conscious face perception. Proceedings of the National Academy of Sciences of the United States of America, 109(52), 2150421509. https://doi.org/10.1073/pnas.1207414110CrossRefGoogle ScholarPubMed
Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223233. https://doi.org/10.1016/s1364-6613(00)01482-0CrossRefGoogle ScholarPubMed
Lamme, V. A. F. (2020). Visual functions generating conscious seeing. Frontiers in Psychology, 11, 83. https://doi.org/10.3389/fpsyg.2020.00083CrossRefGoogle ScholarPubMed
Lamme, V. A. F., Zipser, K., & Spekreijse, H. (2002). Masking interrupts figure-ground signals in V1. Journal of Cognitive Neuroscience, 14(7), 10441053. https://doi.org/10.1162/089892902320474490CrossRefGoogle ScholarPubMed
Ludwig, D. (2023). The functions of consciousness in visual processing. Neuroscience of Consciousness, 2023(1), niac018. https://doi.org/10.1093/nc/niac018CrossRefGoogle ScholarPubMed
Mudrik, L., & Deouell, L. Y. (2022). Neuroscientific evidence for processing without awareness. Annual Review of Neuroscience, 45(1), 403423. https://doi.org/10.1146/annurev-neuro-110920-033151CrossRefGoogle ScholarPubMed
Seth, A. K., Dienes, Z., Cleeremans, A., Overgaard, M., & Pessoa, L. (2008). Measuring consciousness: Relating behavioural and neurophysiological approaches. Trends in Cognitive Sciences, 12(8), 314321. https://doi.org/10.1016/j.tics.2008.04.008CrossRefGoogle ScholarPubMed
Timmermans, B., Schilbach, L., Pasquali, A., & Cleeremans, A. (2012). Higher order thoughts in action: Consciousness as an unconscious re-description process. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1594), 14121423. https://doi.org/10.1098/rstb.2011.0421CrossRefGoogle ScholarPubMed
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 6000–6010. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle Scholar
Wang, L., Weng, X., & He, S. (2012). Perceptual grouping without awareness: Superiority of Kanizsa triangle in breaking interocular suppression. PLoS ONE, 7(6), e40106. https://doi.org/10.1371/journal.pone.0040106CrossRefGoogle ScholarPubMed
Wierzchoń, M., Paulewicz, B., Asanowicz, D., Timmermans, B., & Cleeremans, A. (2014). Different subjective awareness measures demonstrate the influence of visual identification on perceptual awareness ratings. Consciousness and Cognition, 27, 109120. https://doi.org/10.1016/j.concog.2014.04.009CrossRefGoogle ScholarPubMed