Binz et al. (argument 2) advocate for the superiority of meta-learned models over Bayesian inference for addressing large world problems (Savage, Reference Savage1972). Our commentary aims to question some perceived fallacies in their arguments.
First, although we recognize that “Identifying the correct set of assumptions becomes especially challenging once we deal with more complex problems,” we point out that meta-learned models also require specific assumptions. Examples are the selection of samples from the data-generating distribution, choice of the optimizer, weight initializations, or constraints to mimic bounded rationality. These decisions, too, can be conceived as priors and require a certain level of justification. Binz et al. explicitly emphasized the importance of appropriately making these choices (sect. 4, “Intricate Training Processes”). Bayesian or not, prior knowledge is a necessary condition for both modeling procedures. As a consequence, we contend that both Bayesian and meta-learned models present similar challenges from a rational perspective. Therefore, why should it be “hard to justify” prior assumptions for Bayesian models and not for meta-learned models? For instance, one could reconsider the critiques moved to Lucas, Griffiths, Williams, and Kalish (Reference Lucas, Griffiths, Williams and Kalish2015). To account for the bias toward expecting linear relationships between continuous variables, the authors assigned lower prior probabilities to quadratic and radial relationships as compared to linear ones (Lucas et al., Reference Lucas, Griffiths, Williams and Kalish2015). Binz et al. pose the issue that the chosen prior might not reflect all the functions (and the associated probability). However, similar concerns arise in the context of meta-learned models. What justifications exist for the selection of training data? How does one determine which functions to employ in the tasks used for training the model? Even more, on which tasks the model should be trained? Are these decisions easier to justify from a rational perspective as compared to the Bayesian counterpart? If the definition of the priors is considered a main obstacle of Bayesian inference to large world problems, a similar challenge extends to the decisions mentioned above, which determine the initial parameterization of meta-learned models and could be conceived as equivalent to a “prior” (Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019). Finally, if it is the impossibility to “have access to a prior or a likelihood” the main obstacle to large world problems, what is “the unique feature of meta-learned models” compared to other Bayesian methods that can construct their own empirical priors (e.g., hierarchical models and empirical Bayes; Friston & Stephan, Reference Friston and Stephan2007) or that bypass the evaluation of the likelihood function (e.g., approximate Bayesian computation: Beaumont, Reference Beaumont2010; likelihood-free inference: Papamakarios, Nalisnick, Rezende, Mohamed, & Lakshminarayanan, Reference Papamakarios, Nalisnick, Rezende, Mohamed and Lakshminarayanan2021; simulation-based inference: Cranmer, Brehmer, & Louppe, Reference Cranmer, Brehmer and Louppe2020)?
Second, it should be noted that the “meta-learning” feature is not exclusive of meta-learned models in machine learning, but it can be achieved using hierarchical Bayesian models (Grant, Finn, Levine, Darrell, & Griffiths, Reference Grant, Finn, Levine, Darrell and Griffiths2018; Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019; Kemp, Perfors, & Tenenbaum, Reference Kemp, Perfors and Tenenbaum2007; Li, Callaway, Thompson, Adams, & Griffiths, Reference Li, Callaway, Thompson, Adams and Griffiths2023). Hence, if meta-learning is taken as an argument to enable computational models to face large world problems, it cannot be used as an argument in favor of meta-learned models over hierarchical Bayesian inference.
Putting together our first two concerns, we think that a more fair comparison between meta-learned models (as defined in the target article) and hierarchical (approximate) Bayesian models would have been necessary to assert that meta-learned models contain “unique” features to address large world problems.
A further concern regards meta-learning as a solution for large world problems. Following Binmore (Reference Binmore2007), the distinction between small and large worlds can be interpreted as making decisions under risk or uncertainty, respectively. In the first case, decision makers know all contingencies of the problem and fully apply the Bayes’ rule to make the optimal decision. Large world problems are situations characterized by uncertainty about the causes and the likelihood of the events. In other terms, large world problems can be conceived as situations in which environmental assumptions previously acquired do not hold. However, if meta-learned models need to be retrained when environmental assumptions differ from the training, it follows that the use of meta-learned models can be justified only in small worlds, where previous knowledge can be used to make choices.
Finally, it should be highlighted that the target article grounds meta-learned models on the rational analysis framework (Anderson, Reference Anderson1991) given their property of approximate Bayes optimal solutions. However, Savage's and Binmore's argument was that there is no special justification for rational Bayesian solutions to large world problems. In our opinion, if one wants to hold with this rational perspective, neither Bayesian nor meta-learned models can be considered idoneous to model decision making under uncertainty. However, a possible way out of this impasse can come from psychological and cognitive research fields that have investigated decision making under uncertainty. Theoretical frameworks like the free-energy principle (Friston et al., Reference Friston, Da Costa, Sajid, Heins, Ueltzhöffer, Pavliotis and Parr2023) or reinforcement learning (Dimitrakakis & Ortner, Reference Dimitrakakis and Ortner2022; Kochenderfer, Reference Kochenderfer2015) have investigated how learning under uncertainty occurs and it is used to construct beliefs that guide decisions in situations where causes of the event are unknown. In our opinion, implementing ideas from these frameworks in the models can be a promising way to solve large world problems.
In conclusion, we do not think that Binz et al. have provided convincing support for the claim “The ability to construct Bayes-optimal learning algorithms for large world problems is a unique feature of the meta-learning framework.” We suggest that grounding on the rational analysis of cognition framework is not sufficient for modeling decisions in large worlds, and that exploring and integrating other theoretical frameworks could offer valuable insights to advance their research program.
Binz et al. (argument 2) advocate for the superiority of meta-learned models over Bayesian inference for addressing large world problems (Savage, Reference Savage1972). Our commentary aims to question some perceived fallacies in their arguments.
First, although we recognize that “Identifying the correct set of assumptions becomes especially challenging once we deal with more complex problems,” we point out that meta-learned models also require specific assumptions. Examples are the selection of samples from the data-generating distribution, choice of the optimizer, weight initializations, or constraints to mimic bounded rationality. These decisions, too, can be conceived as priors and require a certain level of justification. Binz et al. explicitly emphasized the importance of appropriately making these choices (sect. 4, “Intricate Training Processes”). Bayesian or not, prior knowledge is a necessary condition for both modeling procedures. As a consequence, we contend that both Bayesian and meta-learned models present similar challenges from a rational perspective. Therefore, why should it be “hard to justify” prior assumptions for Bayesian models and not for meta-learned models? For instance, one could reconsider the critiques moved to Lucas, Griffiths, Williams, and Kalish (Reference Lucas, Griffiths, Williams and Kalish2015). To account for the bias toward expecting linear relationships between continuous variables, the authors assigned lower prior probabilities to quadratic and radial relationships as compared to linear ones (Lucas et al., Reference Lucas, Griffiths, Williams and Kalish2015). Binz et al. pose the issue that the chosen prior might not reflect all the functions (and the associated probability). However, similar concerns arise in the context of meta-learned models. What justifications exist for the selection of training data? How does one determine which functions to employ in the tasks used for training the model? Even more, on which tasks the model should be trained? Are these decisions easier to justify from a rational perspective as compared to the Bayesian counterpart? If the definition of the priors is considered a main obstacle of Bayesian inference to large world problems, a similar challenge extends to the decisions mentioned above, which determine the initial parameterization of meta-learned models and could be conceived as equivalent to a “prior” (Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019). Finally, if it is the impossibility to “have access to a prior or a likelihood” the main obstacle to large world problems, what is “the unique feature of meta-learned models” compared to other Bayesian methods that can construct their own empirical priors (e.g., hierarchical models and empirical Bayes; Friston & Stephan, Reference Friston and Stephan2007) or that bypass the evaluation of the likelihood function (e.g., approximate Bayesian computation: Beaumont, Reference Beaumont2010; likelihood-free inference: Papamakarios, Nalisnick, Rezende, Mohamed, & Lakshminarayanan, Reference Papamakarios, Nalisnick, Rezende, Mohamed and Lakshminarayanan2021; simulation-based inference: Cranmer, Brehmer, & Louppe, Reference Cranmer, Brehmer and Louppe2020)?
Second, it should be noted that the “meta-learning” feature is not exclusive of meta-learned models in machine learning, but it can be achieved using hierarchical Bayesian models (Grant, Finn, Levine, Darrell, & Griffiths, Reference Grant, Finn, Levine, Darrell and Griffiths2018; Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019; Kemp, Perfors, & Tenenbaum, Reference Kemp, Perfors and Tenenbaum2007; Li, Callaway, Thompson, Adams, & Griffiths, Reference Li, Callaway, Thompson, Adams and Griffiths2023). Hence, if meta-learning is taken as an argument to enable computational models to face large world problems, it cannot be used as an argument in favor of meta-learned models over hierarchical Bayesian inference.
Putting together our first two concerns, we think that a more fair comparison between meta-learned models (as defined in the target article) and hierarchical (approximate) Bayesian models would have been necessary to assert that meta-learned models contain “unique” features to address large world problems.
A further concern regards meta-learning as a solution for large world problems. Following Binmore (Reference Binmore2007), the distinction between small and large worlds can be interpreted as making decisions under risk or uncertainty, respectively. In the first case, decision makers know all contingencies of the problem and fully apply the Bayes’ rule to make the optimal decision. Large world problems are situations characterized by uncertainty about the causes and the likelihood of the events. In other terms, large world problems can be conceived as situations in which environmental assumptions previously acquired do not hold. However, if meta-learned models need to be retrained when environmental assumptions differ from the training, it follows that the use of meta-learned models can be justified only in small worlds, where previous knowledge can be used to make choices.
Finally, it should be highlighted that the target article grounds meta-learned models on the rational analysis framework (Anderson, Reference Anderson1991) given their property of approximate Bayes optimal solutions. However, Savage's and Binmore's argument was that there is no special justification for rational Bayesian solutions to large world problems. In our opinion, if one wants to hold with this rational perspective, neither Bayesian nor meta-learned models can be considered idoneous to model decision making under uncertainty. However, a possible way out of this impasse can come from psychological and cognitive research fields that have investigated decision making under uncertainty. Theoretical frameworks like the free-energy principle (Friston et al., Reference Friston, Da Costa, Sajid, Heins, Ueltzhöffer, Pavliotis and Parr2023) or reinforcement learning (Dimitrakakis & Ortner, Reference Dimitrakakis and Ortner2022; Kochenderfer, Reference Kochenderfer2015) have investigated how learning under uncertainty occurs and it is used to construct beliefs that guide decisions in situations where causes of the event are unknown. In our opinion, implementing ideas from these frameworks in the models can be a promising way to solve large world problems.
In conclusion, we do not think that Binz et al. have provided convincing support for the claim “The ability to construct Bayes-optimal learning algorithms for large world problems is a unique feature of the meta-learning framework.” We suggest that grounding on the rational analysis of cognition framework is not sufficient for modeling decisions in large worlds, and that exploring and integrating other theoretical frameworks could offer valuable insights to advance their research program.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interest
None.