Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-24T18:21:53.023Z Has data issue: false hasContentIssue false

The meta-learning toolkit needs stronger constraints

Published online by Cambridge University Press:  23 September 2024

Erin Grant*
Affiliation:
UCL Gatsby Unit, Sainsbury Wellcome Centre, University College London, London, UK [email protected] https://eringrant.github.io/
*
*Corresponding author.

Abstract

The implementation of meta-learning targeted by Binz et al. inherits benefits and drawbacks from its nature as a connectionist model. Drawing from historical debates around bottom-up and top-down approaches to modeling in cognitive science, we should continue to bridge levels of analysis by constraining meta-learning and meta-learned models with complementary evidence from across the cognitive and computational sciences.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

Meta-learning as a model allows researchers to posit how human and other biological learning systems might learn from experience in a structured manner, including by relating experiences across timescales or latent causes non-uniformly. Meta-learning as a tool allows researchers to posit flexible and data-driven learning algorithms as computational models of human learning than those are readily expressed by machine learning algorithms such as gradient descent with canonical parameters, or inference in a Bayesian model in which exact inference is tractable. These senses of a “meta-learning” and a “meta-learned” model align with the dichotomy employed in Binz et al.

Meta-learning in both senses and using the implementation focused on in Binz et al. – a recurrent neural network – further inherits characteristics of connectionism: Universal approximation, ease of specification, manipulability (including of complexity), and integration of neuroscientific findings, which Binz et al. rightly note as positives. However, this implementation of meta-learning also inherits the challenges of a connectionist approach: Lack of interpretability (the ease with which humans can understand the workings and outputs of a system) and controllability (the ability to modify a model's behavior or learning process to achieve specific outcomes).

These benefits and drawbacks of the bottom-up, emergentist approach of connectionism have been discussed at length, including in this journal (Smolensky, Reference Smolensky1988). As a result of these discussions, a common ground between these and top-down structured approaches such as Bayesian cognitive modeling has emerged: That models posed in different description languages may not be at odds simply because they are posed at different levels of analysis, and in fact should be tested for complementarity (Rogers & McClelland, Reference Rogers and McClelland2008; Griffiths, Vul, & Sanborn, Reference Griffiths, Vul and Sanborn2012).

It is this integrative approach that I view as the most fruitful in examining the validity of meta-learning and meta-learned cognitive models precisely because (1) it allows us to address the challenges of working within a single paradigm (say, the lack of interpretability of a connectionist approach) at the same time as (2) providing stronger grounds on which to refute a cognitive model (say, by its inconsistency with evidence from neural recordings, or its inability to account for how an ecological task is solved). Making use of the former benefit is especially critical, as the meta-learned models commonly employed, including by Binz et al., have the potential to be even more inscrutable than a connectionist model initialized in a data-agnostic way.

Binz et al. discuss two studies of meta-learning and meta-learned models that bridge levels of analysis in this manner: Firstly, a meta-learning algorithm has been tested against experimental neuroscience findings in prefrontal cortex (Wang et al., Reference Wang, Kurth-Nelson, Kumaran, Tirumala, Soyer, Leibo and Botvinick2018). Secondly, a meta-learned recurrent neural network can approximate the posterior predictive distribution picked out as optimal by a Bayesian approach (Ortega et al., Reference Ortega, Wang, Rowland, Genewein, Kurth-Nelson, Pascanu and Legg2019). Connecting neuroscientific findings with computational-level analysis via algorithm is an exciting result. However, as Binz et al. note, the goodness of fit of the meta-learned approximation employed in both studies is not guaranteed, and has been empirically demonstrated to be poor.

As a contrast to an approach that makes use of approximation, our work (Grant, Finn, Levine, Darrell, & Griffiths, 2018) draws a formal connection between a connectionist implementation of meta-learning and inference in a hierarchical Bayesian model by making precise the prior, likelihood, and parameter estimation procedure implied in the use of the meta-learning implementation. Equivalently, this result describes a way to implement a rational solution to a problem of learning-to-learn in a connectionist architecture (though there are likely to be many equivalent implementations). A formal integration across levels like this is tighter than an approximation approach, and therefore provides a firmer footing for integrative constraints across levels of analysis.

Follow-up investigations have made use of this connection between computational-level and algorithmic-level approaches. For example, in McCoy, Grant, Smolensky, Griffiths, and Linzen (Reference McCoy, Grant, Smolensky, Griffiths and Linzen2020), we used an analogous setup to Grant et al. (2018) to meta-learn a syllable typology in a limited data setting akin to an impoverished language learning environment. To better accommodate the complex dynamics of learning, we relaxed some constraints on the meta-learning algorithm, thus for the moment doing away with the tight connection between the algorithmic and computational levels. However, in sticking with methods – namely tuning the gradient-based initialization for learning in a neural network – for which ongoing research in machine learning is formally characterizing how prior knowledge (Dominé, Braun, Fitzgerald, & Saxe, Reference Dominé, Braun, Fitzgerald and Saxe2023), including data-driven prior knowledge (Lindsey & Lippl, Reference Lindsey and Lippl2023), interacts with the learning algorithm and environment, my view is that these approaches will soon benefit from tighter connections between the algorithmic and computational levels echoing to the connection derived in Grant et al. (2018).

Absent these connections, because meta-learning and meta-learned models are underconstrained and data-driven, it is challenging to evaluate the validity and implications of these models for our understanding of how experience shapes learning. Thus, scientists interested in the place of meta-learning and meta-learned models in cognitive science should work to make precise the constraints that these models imply across levels of analysis, including by making use of analytical techniques from machine learning, at the same time looking into complementary constraints from experimental neuroscience, and ecologically relevant environments. Given that so many aspects remain open, it is an exciting time to be working with and on meta-learning toolkit.

Acknowledgments

N/A.

Financial support

This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.

Competing interest

None.

References

Dominé, C. C., Braun, L., Fitzgerald, J. E., & Saxe, A. M. (2023). Exact learning dynamics of deep linear networks with prior knowledge. Journal of Statistical Mechanics: Theory and Experiment, 2023(11), 114004. https://doi.org/10.1088/1742-5468/ad01b8CrossRefGoogle Scholar
Griffiths, T. L., Vul, E., & Sanborn, A. N. (2012). Bridging levels of analysis for probabilistic models of cognition. Current Directions in Psychological Science, 21(4), 263268. https://doi.org/10.1177/0963721412447619CrossRefGoogle Scholar
Lindsey, J. W., & Lippl, S. (2023). Implicit regularization of multi-task learning and finetuning in overparameterized neural networks. arXiv preprint arXiv:2310.02396. https://doi.org/10.48550/arXiv.2310.02396CrossRefGoogle Scholar
McCoy, R. T., Grant, E., Smolensky, P., Griffiths, T. L., & Linzen, T. (2020). Universal linguistic inductive biases via meta-learning. In Proceedings of the Annual Meeting of the Cognitive Science Society. https://doi.org/10.48550/arXiv.2006.16324CrossRefGoogle Scholar
Ortega, P. A., Wang, J. X., Rowland, M., Genewein, T., Kurth-Nelson, Z., Pascanu, R., … Legg, S. (2019). Meta-learning of sequential strategies. arXiv preprint arXiv:1905.03030. https://doi.org/10.48550/arXiv.1905.03030CrossRefGoogle Scholar
Rogers, T. T., & McClelland, J. L. (2008). A simple model from a powerful framework that spans levels of analysis. Behavioral and Brain Sciences, 31(6), 729749. doi:10.1017/S0140525X08006067CrossRefGoogle Scholar
Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1), 123. https://doi.org/10.1017/S0140525X00052432CrossRefGoogle Scholar
Wang, J. X., Kurth-Nelson, Z., Kumaran, D., Tirumala, D., Soyer, H., Leibo, J. Z., … Botvinick, M. (2018). Prefrontal cortex as a meta-reinforcement learning system. Nature Neuroscience 21, 860868. https://doi.org/10.1038/s41593-018-0147-8CrossRefGoogle ScholarPubMed