The authors discuss meta-learning as a flexible and computationally efficient tool to generate cognitive models from training data and thereby to avoid the need for handcrafting cognitive biases as usually done in current cognitive architectures or Bayesian learning. They provide four supporting arguments as a motivation for a systematic research program on meta-learning, which they diagnose as so far largely missing. While we agree with this stance, we propose that a deeper understanding of meta-learning would benefit from complementing the focus on learning with an equally strong focus on structure, that is, to address the question: What are the meta-structures that are decisive to shape meta-learning?
The reasoning for our proposal derives from the authors' “Argument 3”, where they argue that meta-learning makes it easy to manipulate a learning algorithm's complexity to construct resource-rational models of learning. By admitting complexity as an important control for model formation, the authors introduce structural discriminations between meta-learners. But as a scalar measure, complexity cannot avoid “collapsing” qualitatively different structures whenever these are assigned the same complexity. Therefore, we suggest extending the research program beyond scalar orderings as complexity measures: Viewing meta-structures as patterns of higher-order structure that are qualitatively different from each other and that offer structural-functional “modules” that can be constructed as entities in their own right and be flexibly used by a meta-learning system. This view draws close inspiration from the advocated neuroscience perspective (their “Argument 4”) how constraints of the neurobiological substrate determine the emergence of specific control structures. Meta-structures are thus abstractable structural principles guiding the development of “substrate-level” structures in a meta-learning system. While meta-learning summarizes many learning trajectories into an overarching base learner (that can quickly specialize), meta-structure summarizes many learning priors into an overarching “base prior” (that then guides meta-learning efficiently).
Examples for such guiding meta-structures can be found in biological neural network models. As a first meta-structure, we consider hierarchical organization: The decomposition of actions into sub-actions on different levels of a hierarchy enables flexible recombination into different behaviors. Hierarchical organization is an established principle of biological motor control that has been applied successfully to Deep Reinforcement Learning (DRL) (Merel, Botvinick, & Wayne, Reference Merel, Botvinick and Wayne2019; Neftci & Averbeck, Reference Neftci and Averbeck2019). As a benefit, hierarchical organization enables a form of higher-level learning in which the learner can recombine modular policies into new behaviors without the need to always learn all details from scratch.
As a second example of a meta-structure: Decentralization serves parallelization of modules' actions, decoupling of subtasks and factorization of state spaces. Decentralization is well-investigated in motor control in animals, for example, in low-level reflexes, but it is also widely acknowledged that decentralized oscillation generating neuronal circuits are essential for locomotion (cf. Dickinson et al., Reference Dickinson, Farley, Full, Koehl, Kram and Lehman2000). While decentralization often is merely characterized as a strategy to cope with slow sensory processing, we emphasize how decentralization facilitates meta-learning. In a study on learning of motor control featuring decentralized modules for a four-legged walker, we showed how decentralization positively affected reinforcement learning on two levels (Schilling, Melnik, Ohl, Ritter, & Hammer, Reference Schilling, Melnik, Ohl, Ritter and Hammer2021). First, on the basic learning level, decentralization remedies the problem of exponential increase of required training runs in traditional DRL systems as the action space becomes more complex. Decentralization restricts the action space to the much lower number of local actuators, thereby reducing the dimensionality. Without the need for coordination of all control signals by a single centralized controller, the decentralized network learned stable behaviors much faster. Second, on the level of meta-learning, the trained decentralized controller appeared to learn a different, more robust, mapping when compared to a standard centralized controller: The decentralized control structure had learned to transfer previously learned aspects of motor control to entirely new terrains without the need for further context-specific training. Thus, with respect to meta-learning this structural prior (meta-structure) of decentralization proved beneficial for extrapolation of behavior and appears to learn better suited mappings for a broader range of tasks.
A further example of meta-structures can be identified in reversal learning. In reversal learning, an agent initially learns a mapping, for example, between certain situations and corresponding appropriate responses, and then finds itself in a situation that requires a different stimulus response mapping to achieve behavioral goals. While standard DRL agents learn new mappings at a reversal point from scratch, biological organisms typically solve reversal problems more effectively (Happel et al., Reference Happel, Niekisch, Castiblanco Rivera, Ohl, Deliano and Frischknecht2014): They create already during the initial learning phase hierarchically organized representation structures that can be efficiently used for the new required mapping without learning it from scratch (Jarvers et al., Reference Jarvers, Brosch, Brechmann, Woldeit, Schulz, Ohl and Neumann2016).
The examples imply that a common mechanism by which meta-structures support meta-learning is that of enabling a learning agent to build context. Context-building is naturally introduced by meta-learning itself, for example, as conceived in the target article: An inner-loop learner operates at the fast time scale, with temporally short-ranged contexts, while an “outer-loop process,” which could be implemented either as dedicated network modules or as processes made possible on decentralized structures, works at time scales and higher levels of abstraction that allow tuning of inner learner's adaptivity (Schilling, Hammer, Ohl, Ritter, & Wiskott, Reference Schilling, Hammer, Ohl, Ritter and Wiskott2023). This meta-structure can be imagined as recursively extendable, leading to an “onion-like” architecture providing a principled stratification of an overall learning process into a layered hierarchy of learners operating at different levels of granularity (or abstraction), with correspondingly scaled scopes of context. Such concepts can contribute to our understanding of sophisticated learning capabilities as needed by embodied agents that continuously adapt interactions of their body with the environment involving contexts at different temporal and spatial scales.
The discussion of meta-structures illustrates that meta-learning is based on abstractable structural principles that support the generation of a compositional semantics linking the functions of modules that emerge from learning in a given context. Meta-structures provide structural preconditions for the establishment of almost arbitrary high-level compositional semantics that can be flexibly reused and serve as building blocks of higher-level abstractions for novel problem solutions without requiring specific training in novel contexts. Together, these aspects underscore the significance of meta-structure for a research program on meta-learning.
The authors discuss meta-learning as a flexible and computationally efficient tool to generate cognitive models from training data and thereby to avoid the need for handcrafting cognitive biases as usually done in current cognitive architectures or Bayesian learning. They provide four supporting arguments as a motivation for a systematic research program on meta-learning, which they diagnose as so far largely missing. While we agree with this stance, we propose that a deeper understanding of meta-learning would benefit from complementing the focus on learning with an equally strong focus on structure, that is, to address the question: What are the meta-structures that are decisive to shape meta-learning?
The reasoning for our proposal derives from the authors' “Argument 3”, where they argue that meta-learning makes it easy to manipulate a learning algorithm's complexity to construct resource-rational models of learning. By admitting complexity as an important control for model formation, the authors introduce structural discriminations between meta-learners. But as a scalar measure, complexity cannot avoid “collapsing” qualitatively different structures whenever these are assigned the same complexity. Therefore, we suggest extending the research program beyond scalar orderings as complexity measures: Viewing meta-structures as patterns of higher-order structure that are qualitatively different from each other and that offer structural-functional “modules” that can be constructed as entities in their own right and be flexibly used by a meta-learning system. This view draws close inspiration from the advocated neuroscience perspective (their “Argument 4”) how constraints of the neurobiological substrate determine the emergence of specific control structures. Meta-structures are thus abstractable structural principles guiding the development of “substrate-level” structures in a meta-learning system. While meta-learning summarizes many learning trajectories into an overarching base learner (that can quickly specialize), meta-structure summarizes many learning priors into an overarching “base prior” (that then guides meta-learning efficiently).
Examples for such guiding meta-structures can be found in biological neural network models. As a first meta-structure, we consider hierarchical organization: The decomposition of actions into sub-actions on different levels of a hierarchy enables flexible recombination into different behaviors. Hierarchical organization is an established principle of biological motor control that has been applied successfully to Deep Reinforcement Learning (DRL) (Merel, Botvinick, & Wayne, Reference Merel, Botvinick and Wayne2019; Neftci & Averbeck, Reference Neftci and Averbeck2019). As a benefit, hierarchical organization enables a form of higher-level learning in which the learner can recombine modular policies into new behaviors without the need to always learn all details from scratch.
As a second example of a meta-structure: Decentralization serves parallelization of modules' actions, decoupling of subtasks and factorization of state spaces. Decentralization is well-investigated in motor control in animals, for example, in low-level reflexes, but it is also widely acknowledged that decentralized oscillation generating neuronal circuits are essential for locomotion (cf. Dickinson et al., Reference Dickinson, Farley, Full, Koehl, Kram and Lehman2000). While decentralization often is merely characterized as a strategy to cope with slow sensory processing, we emphasize how decentralization facilitates meta-learning. In a study on learning of motor control featuring decentralized modules for a four-legged walker, we showed how decentralization positively affected reinforcement learning on two levels (Schilling, Melnik, Ohl, Ritter, & Hammer, Reference Schilling, Melnik, Ohl, Ritter and Hammer2021). First, on the basic learning level, decentralization remedies the problem of exponential increase of required training runs in traditional DRL systems as the action space becomes more complex. Decentralization restricts the action space to the much lower number of local actuators, thereby reducing the dimensionality. Without the need for coordination of all control signals by a single centralized controller, the decentralized network learned stable behaviors much faster. Second, on the level of meta-learning, the trained decentralized controller appeared to learn a different, more robust, mapping when compared to a standard centralized controller: The decentralized control structure had learned to transfer previously learned aspects of motor control to entirely new terrains without the need for further context-specific training. Thus, with respect to meta-learning this structural prior (meta-structure) of decentralization proved beneficial for extrapolation of behavior and appears to learn better suited mappings for a broader range of tasks.
A further example of meta-structures can be identified in reversal learning. In reversal learning, an agent initially learns a mapping, for example, between certain situations and corresponding appropriate responses, and then finds itself in a situation that requires a different stimulus response mapping to achieve behavioral goals. While standard DRL agents learn new mappings at a reversal point from scratch, biological organisms typically solve reversal problems more effectively (Happel et al., Reference Happel, Niekisch, Castiblanco Rivera, Ohl, Deliano and Frischknecht2014): They create already during the initial learning phase hierarchically organized representation structures that can be efficiently used for the new required mapping without learning it from scratch (Jarvers et al., Reference Jarvers, Brosch, Brechmann, Woldeit, Schulz, Ohl and Neumann2016).
The examples imply that a common mechanism by which meta-structures support meta-learning is that of enabling a learning agent to build context. Context-building is naturally introduced by meta-learning itself, for example, as conceived in the target article: An inner-loop learner operates at the fast time scale, with temporally short-ranged contexts, while an “outer-loop process,” which could be implemented either as dedicated network modules or as processes made possible on decentralized structures, works at time scales and higher levels of abstraction that allow tuning of inner learner's adaptivity (Schilling, Hammer, Ohl, Ritter, & Wiskott, Reference Schilling, Hammer, Ohl, Ritter and Wiskott2023). This meta-structure can be imagined as recursively extendable, leading to an “onion-like” architecture providing a principled stratification of an overall learning process into a layered hierarchy of learners operating at different levels of granularity (or abstraction), with correspondingly scaled scopes of context. Such concepts can contribute to our understanding of sophisticated learning capabilities as needed by embodied agents that continuously adapt interactions of their body with the environment involving contexts at different temporal and spatial scales.
The discussion of meta-structures illustrates that meta-learning is based on abstractable structural principles that support the generation of a compositional semantics linking the functions of modules that emerge from learning in a given context. Meta-structures provide structural preconditions for the establishment of almost arbitrary high-level compositional semantics that can be flexibly reused and serve as building blocks of higher-level abstractions for novel problem solutions without requiring specific training in novel contexts. Together, these aspects underscore the significance of meta-structure for a research program on meta-learning.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Competing interests
None.