Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-27T18:35:50.491Z Has data issue: false hasContentIssue false

Really radical?

Published online by Cambridge University Press:  08 May 2023

Karl Friston*
Affiliation:
The Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK. [email protected] https://www.fil.ion.ucl.ac.uk/~karl/

Abstract

I enjoyed reading this compelling account of Conviction Narrative Theory (CNT). As a theoretical neurobiologist, I recognised – and applauded – the tenets of CNT. My commentary asks whether its claims could be installed into a Bayesian mechanics of decision-making, in a way that would enable theoreticians to model, reproduce and predict decision-making.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

Conviction Narrative Theory (CNT) (target article) is both a “theory of narratives” and a “narrative theory” that precludes mathematical or numerical analysis. This commentary reviews the commitments of CNT through the lens of active inference and self-evidencing (Hohwy, Reference Hohwy2016), asking whether CNT could lend itself to a formal (Bayesian) treatment.

Box 1 summarises the fundaments of active inference, as it relates to decision-making under uncertainty. With these fundaments, one can simulate the kind of decision-making addressed by CNT. For example, active inference reproduces decision-making under unknowable circumstances (Friston et al., Reference Friston, FitzGerald, Rigoli, Schwartenbeck, O'Doherty and Pezzulo2016); it dissolves the exploration–exploitation dilemma and provides a principled account of affordances (Schwartenbeck et al., Reference Schwartenbeck, Passecker, Hauser, FitzGerald, Kronbichler and Friston2019). It can model the spread of ideas (Albarracin, Demekas, Ramstead, & Heins, Reference Albarracin, Demekas, Ramstead and Heins2022) and has been applied to cultural niche construction and social norms (Veissiere, Constant, Ramstead, Friston, & Kirmayer, Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019).

Box 1. Active inference

Recent trends in theoretical neurobiology, machine learning and artificial intelligence converge on a single imperative that explains both sense-making and decision-making in self-organising systems, from cells (Friston, Levin, Sengupta, & Pezzulo, Reference Friston, Levin, Sengupta and Pezzulo2015) to cultures (Veissiere et al., Reference Veissiere, Constant, Ramstead, Friston and Kirmayer2019). This imperative is to maximise the evidence (a.k.a. marginal likelihood) for generative (a.k.a., world) models of how observations are caused. This imperative can be expressed as minimising an evidence bound called variational free energy (Winn & Bishop, Reference Winn and Bishop2005) that comprises complexity and accuracy (Ramstead et al., Reference Ramstead, Sakthivadivel, Heins, Koudahl, Millidge, Da Costa and Friston2022):

$${\rm Free\ energy} = {\rm model\ complexity\,\ndash\, model\ accuracy}$$

Accuracy corresponds to goodness of fit, while complexity scores the divergence between prior beliefs (before seeing outcomes) and posterior beliefs (afterwards). In short, complexity scores the information gain or cost of changing one's mind (in an information theoretic and thermodynamic sense, respectively). This means Bayesian belief updating is about finding an accurate explanation that is minimally complex (c.f., Occam's principle). In an enactive setting – apt for explaining decision-making – beliefs about “which plan to commit to” are based on the free energy expected under a plausible plan. This implicit planning as inference can be expressed as minimising expected free energy (Friston, Daunizeau, Kilner, & Kiebel, Reference Friston, Daunizeau, Kilner and Kiebel2010):

$${\rm Expected\ free\ energy} = {\rm risk\ }( {{\rm expected\ complexity}} ) {\rm} + {\rm ambiguity\ }( {{\rm expected\ inaccuracy}} ) $$

Risk is the divergence between probabilistic predictions about outcomes, given a plan, relative to prior preferences. Ambiguity is the expected inaccuracy. An alternative decomposition is especially interesting from the perspective of CNT:

$${\rm Expected\ free\ energy} = {\rm expected\ cost\,\ndash\, expected\ information\ gain}$$

The expected information gain underwrites the principles of optimal Bayesian design (Lindley, Reference Lindley1956), while expected cost underwrites Bayesian decision theory (Berger, Reference Berger2011). However, there is a twist that distinguishes active inference from expected utility theory. In active inference, there is no single, privileged outcome that furnishes a utility or cost function. Rather, utilities are replaced by preferences, quantified by the (log) likelihood of encountering every aspect of an observable outcome. In short, active inference appeals to two kinds of Bayes optimality and subsumes information and preference-seeking behaviour under a single objective.

What prevents CNT from using active inference for simulation, scenario modelling or computational phenotyping (Parr, Rees, & Friston, Reference Parr, Rees and Friston2018)? One answer is that the requisite generative models are too complex and difficult to specify. However, there may be some commitments of CNT that could be usefully dismantled, enabling its claims to be substantiated with simulations and implicit proof of principle.

In active inference, narratives feature as prior beliefs. Indeed, the plans – that underwrite policy selection – are often described as narratives (Friston, Rosch, Parr, Price, & Bowman, Reference Friston, Rosch, Parr, Price and Bowman2017b). So, could one cast CNT in terms of narrative (i.e., policy) selection and “planning as inference” (Attias, Reference Attias2003; Botvinick & Toussaint, Reference Botvinick and Toussaint2012; Matsumoto & Tani, Reference Matsumoto and Tani2020)? In what follows, five arguments against formalising CNT in this fashion are considered and countered.

  1. (1) Radical uncertainty does not admit any Bayesian mechanics because the requisite probabilities do not have a well-defined outcome space.

Radical uncertainty rests upon an unknowable outcome (e.g., John Kay's wheel example). However, outcomes are known quantities that are observed. Technically, radical uncertainty refers to unknowable (i.e., hidden) causes of outcomes. However, finding the right causal explanation just is the problem of Bayesian inference. So what is radical about radical uncertainty? The answer might lie in the hierarchical nature of belief updating and implicit generative models. Given the parameters of a generative model, I can be uncertain about hidden states generating my observations. However, I can also be uncertain about the parameters, given a model. Finally, I can have uncertainty about my model. Radical uncertainty seems to concern the model structure.

Resolving the three kinds of uncertainty above corresponds to inference, learning and model selection, respectively. All entail maximising marginal likelihood or minimising free energy, with respect to posteriors over states, parameters and models, respectively. Model selection is known as structure learning in Radical Constructivism (Salakhutdinov, Tenenbaum, & Torralba, Reference Salakhutdinov, Tenenbaum and Torralba2013; Tenenbaum, Kemp, Griffiths, & Goodman, Reference Tenenbaum, Kemp, Griffiths and Goodman2011; Tervo, Tenenbaum, & Gershman, Reference Tervo, Tenenbaum and Gershman2016). Structure learning is a partly solved problem, through Bayesian model reduction (Friston, Parr, & Zeidman, Reference Friston, Parr and Zeidman2018), where redundant components are removed from an overly expressive model to maximise model evidence (e.g., Smith, Schwartenbeck, Parr, & Friston, Reference Smith, Schwartenbeck, Parr and Friston2020). This reductive approach complements non-parametric Bayes, which formalises the inclusion of new narratives (Gershman & Blei, Reference Gershman and Blei2012). In light of Bayesian model selection, one could argue that radical uncertainty admits a Bayesian mechanics.

  1. (2) The utilities of different kinds of outcomes cannot be compared in a meaningful way.

This is only a problem if one subscribes to Bayesian decision theory as a complete account. Active inference vitiates this objection because to be Bayes optimal is to resolve uncertainty in the context of securing preferred outcomes (i.e., minimise expected free energy; see Box 1). Crucially, expected utility and information gain (Howard, Reference Howard1966; Kamar & Horvitz, Reference Kamar and Horvitz2013; Moulin & Souchay, Reference Moulin and Souchay2015) share the same currency; namely, natural units (when using natural logarithms of prior preferences). This lends a quantitative and comparable meaning to the value of information and preferences.

  1. (3) Certain outcomes are so fuzzy they are impossible to predict and therefore one has to use heuristics.

Knowing something is unpredictable is itself an informative prior that can be installed into hierarchal generative models: c.f., Jaynes' maximum entropy principle (Jaynes, Reference Jaynes1957; Kass & Raftery, Reference Kass and Raftery1995; Sakthivadivel, Reference Sakthivadivel2022). So, how do “fast and frugal” heuristics fit into active inference? Heuristics are generally considered as priors that comply with complexity minimising imperatives (Box 1), for example, habitisation (FitzGerald, Dolan, & Friston, Reference FitzGerald, Dolan and Friston2014) or the minimisation of perceptual prediction errors (Mansell, Reference Mansell2011). In short, heuristics are exactly what active inference – under hierarchal generative models – is there to explain. On this reading of active inference, self-evidencing just is satisfising (Gerd Gigerenzer, personal communication).

  1. (4) But people don't behave as if they were rational, or even with bounded rationality.

Many careful studies in cognitive neuroscience are concerned with how people deviate from Bayes optimality. However, this overlooks the complete class theorem (Brown, Reference Brown1981; Wald, Reference Wald1947). The complete class theorem says that for any pair of choice behaviours and cost functions, there are some priors that render the decisions Bayes-optimal. This has the fundamental implication that Bayesian mechanics cannot prescribe optimal (i.e., rational) decision-making. It can only describe rationality in terms of the priors a subject brings to the table. This insight underwrites the emerging field of computational psychiatry, where the game is to estimate the prior beliefs of patients that best explain their decision-making (Schwartenbeck & Friston, Reference Schwartenbeck and Friston2016; Smith, Khalsa, & Paulus, Reference Smith, Khalsa and Paulus2021).

  1. (5) But the dimensionality and numerics of belief updating in realistic generative models are beyond the capacity of any computer, human or otherwise.

This argument rests on the use of sampling procedures to approximate posterior distributions, for example, likelihood free methods or approximate Bayesian computation (Chatzilena, van Leeuwen, Ratmann, Baguelin, & Demiris, Reference Chatzilena, van Leeuwen, Ratmann, Baguelin and Demiris2019; Cornish & Littenberg, Reference Cornish and Littenberg2007; Girolami & Calderhead, Reference Girolami and Calderhead2011; Ma, Chen, & Fox, Reference Ma, Chen and Fox2015; Silver & Veness, Reference Silver and Veness2010). However, active inference rests on variational schemes found in physics, high-end machine learning (Marino, Reference Marino2021) and (probably) the brain (Friston, Parr, & de Vries, Reference Friston, Parr and de Vries2017a). Variational Bayes eschews sampling by committing to a functional form for posterior beliefs; thereby converting an impossible marginalisation problem into an optimisation problem; namely, minimising variational free energy (Feynman, Reference Feynman1972). In summary, some people may think generative models with realistic narratives cannot be inverted; however, they (i.e., these people) are are existence proofs that such models can be inverted.

Financial support

KF is supported by funding for the Wellcome Centre for Human Neuroimaging (Ref: 205103/Z/16/Z) and a Canada-UK Artificial Intelligence Initiative (Ref: ES/T01279X/1).

Competing interest

None.

References

Albarracin, M., Demekas, D., Ramstead, M. J. D., & Heins, C. (2022). Epistemic communities under active inference. Entropy, 24(4).CrossRefGoogle ScholarPubMed
Attias, H. (2003). Planning by probabilistic inference, Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics.Google Scholar
Berger, J. O. (2011). Statistical decision theory and Bayesian analysis. Springer.Google Scholar
Botvinick, M., & Toussaint, M. (2012). Planning as inference. Trends in Cognitive Sciences, 16, 485488.CrossRefGoogle ScholarPubMed
Brown, L. D. (1981). A complete class theorem for statistical problems with finite-sample spaces. Annals of Statistics, 9, 12891300.CrossRefGoogle Scholar
Chatzilena, A., van Leeuwen, E., Ratmann, O., Baguelin, M., & Demiris, N. (2019). Contemporary statistical inference for infectious disease models using Stan. Epidemics, 29, 100367.CrossRefGoogle ScholarPubMed
Cornish, N. J., & Littenberg, T. B. (2007). Tests of Bayesian model selection techniques for gravitational wave astronomy. Physical Review D, 76(8).CrossRefGoogle Scholar
Feynman, R. P. (1972). Statistical mechanics. Benjamin.Google Scholar
FitzGerald, T., Dolan, R., & Friston, K. (2014). Model averaging, optimal inference, and habit formation. Frontiers in Human Neuroscience, 8, 457. doi: 10.3389/fnhum.2014.00457 [KF]CrossRefGoogle ScholarPubMed
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O'Doherty, J., & Pezzulo, G. (2016). Active inference and learning. Neuroscience and Biobehavioral Reviews, 68, 862879.CrossRefGoogle ScholarPubMed
Friston, K., Levin, M., Sengupta, B., & Pezzulo, G. (2015). Knowing one's place: A free-energy approach to pattern regulation. Journal of the Royal Society, Interface, 12, 105.CrossRefGoogle Scholar
Friston, K., Parr, T., & de Vries, B. (2017a). The graphical brain: Belief propagation and active inference. Network Neuroscience, 1, 381414.CrossRefGoogle ScholarPubMed
Friston, K., Parr, T., & Zeidman, P. (2018). Bayesian model reduction. arXiv preprint arXiv:1805.07092.Google Scholar
Friston, K. J., Daunizeau, J., Kilner, J., & Kiebel, S. J. (2010). Action and behavior: A free-energy formulation. Biological Cybernetics, 102, 227260.CrossRefGoogle ScholarPubMed
Friston, K. J., Rosch, R., Parr, T., Price, C., & Bowman, H. (2017b). Deep temporal models and active inference. Neuroscience and Biobehavioral Reviews, 77, 388402.CrossRefGoogle ScholarPubMed
Gershman, S. J., & Blei, D. M. (2012). A tutorial on Bayesian nonparametric models. Journal of Mathematical Psychology, 56, 112.CrossRefGoogle Scholar
Girolami, M., & Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2), 123214.CrossRefGoogle Scholar
Hohwy, J. (2016). The self-evidencing brain. Noûs, 50, 259285.CrossRefGoogle Scholar
Howard, R. (1966). Information value theory. IEEE Transactions on Systems, Science and Cybernetics SSC, 2, 2226.CrossRefGoogle Scholar
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review Series II, 106, 620630.Google Scholar
Kamar, E., & Horvitz, E. (2013). Light at the end of the tunnel: a Monte Carlo approach to computing value of information. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, pp. 571578.Google Scholar
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773795.CrossRefGoogle Scholar
Lindley, D. V. (1956). On a measure of the information provided by an experiment. Annals of Mathematical Statistics, 27, 9861005.CrossRefGoogle Scholar
Ma, Y.-A., Chen, T., & Fox, E. B. (2015). A complete recipe for stochastic gradient MCMC, p. arXiv:1506.04696.Google Scholar
Mansell, W. (2011). Control of perception should be operationalized as a fundamental property of the nervous system. Topics in Cognitive Science, 3, 257261.CrossRefGoogle ScholarPubMed
Marino, J. (2021). Predictive coding, variational autoencoders, and biological connections. Neural Computation, 34, 144.CrossRefGoogle ScholarPubMed
Matsumoto, T., & Tani, J. (2020). Goal-directed planning for habituated agents by active inference using a variational recurrent neural network. Entropy, 22, 564.CrossRefGoogle ScholarPubMed
Moulin, C., & Souchay, C. (2015). An active inference and epistemic value view of metacognition. Cognitive Neuroscience, 6, 221222.CrossRefGoogle ScholarPubMed
Parr, T., Rees, G., & Friston, K. J. (2018). Computational neuropsychology and Bayesian inference. Frontiers in Human Neuroscience, 12, 61.CrossRefGoogle ScholarPubMed
Ramstead, M. J. D., Sakthivadivel, D. A. R., Heins, C., Koudahl, M., Millidge, B., Da Costa, L., … Friston, K. J. (2022). On Bayesian mechanics: A physics of and by beliefs, p. arXiv:2205.11543.Google Scholar
Sakthivadivel, D. A. R. (2022). A constraint geometry for inference and integration, p. arXiv:2203.08119.Google Scholar
Salakhutdinov, R., Tenenbaum, J. B., & Torralba, A. (2013). Learning with hierarchical-deep models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 19581971.CrossRefGoogle ScholarPubMed
Schwartenbeck, P., & Friston, K. (2016). Computational phenotyping in psychiatry: A worked example. eNeuro, 3, 0049-16.2016.CrossRefGoogle ScholarPubMed
Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. eLife, 8, e41703.CrossRefGoogle ScholarPubMed
Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. Proceedings of the Conference on Neural Information Processing Systems.Google Scholar
Smith, R., Khalsa, S. S., & Paulus, M. P. (2021). An active inference approach to dissecting reasons for nonadherence to antidepressants. Biological Psychiatry-Cognitive Neuroscience and Neuroimaging, 6, 919934.CrossRefGoogle ScholarPubMed
Smith, R., Schwartenbeck, P., Parr, T., & Friston, K. J. (2020). An active inference approach to modeling structure learning: Concept learning as an example case. Frontiers in Computational Neuroscience, 14, 41.CrossRefGoogle ScholarPubMed
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331, 12791285.CrossRefGoogle Scholar
Tervo, D. G., Tenenbaum, J. B., & Gershman, S. J. (2016). Toward the neural implementation of structure learning. Current Opinion in Neurobiology, 37, 99105.CrossRefGoogle ScholarPubMed
Veissiere, S. P. L., Constant, A., Ramstead, M. J. D., Friston, K. J., & Kirmayer, L. J. (2019). Thinking through other minds: A variational approach to cognition and culture. The Behavioral and Brain Sciences, 43, e90.CrossRefGoogle ScholarPubMed
Wald, A. (1947). An essentially complete class of admissible decision functions. The Annals of Mathematical Statistics, 18, 549555.CrossRefGoogle Scholar
Winn, J., & Bishop, C. M. (2005). Variational message passing. Journal of Machine Learning Research, 6, 661694.Google Scholar