Impact Statement
We confront the challenge of designing efficient and robust algorithms for grid operation with the paradigm of learning-augmented algorithms, which leverage the excellent performance of machine learning while provably guaranteeing worst-case performance bounds and providing significant computational speedups over standard algorithms. Beyond the setting of generation dispatch, we propose that learning-augmented algorithms hold promise for safely utilizing the power of machine learning in a broader array of energy and sustainability-related control and optimization tasks.
1. Introduction
The need to reduce greenhouse gas emissions to mitigate the impacts of anthropogenic climate change is driving an energy transition characterized by large amounts of renewable generation resources being added to the grid. The variability of solar and wind energy and a relative scarcity of large-scale energy storage require the operation of dispatchable resources, such as fossil fuel generation, to balance out the fluctuations in renewable energy production and maintain reliable grid operation. However, conventional fossil fuel generators incur significant additional costs from the frequent cycling and ramping they must perform under high penetration of renewables, due both to decreased fuel efficiency and increased maintenance required from operating in this regime (Troy et al., Reference Troy, Denny and O’Malley2010). Moreover, most dispatchable resources are limited in their ramp rate, and thus under high penetrations of renewables, such resources must be operated in a manner that anticipates system ramp needs, taking into account the high costs of frequent ramping while ensuring adequate supply to meet demand.
Operating generation optimally requires minimizing fuel costs while accounting for the intertemporal coupling of decisions, including both ramp costs and ramp limits. A natural approach for this problem is model predictive control (MPC), an algorithm that utilizes near-term forecasts of demand and other conditions to choose decisions that minimize aggregate cost over a fixed lookahead horizon (Morari & Lee, Reference Morari and Lee1999). In addition to theoretical work confirming its good performance (e.g., Lin et al., Reference Lin, Hu, Shi, Sun, Qu and Wierman2021), MPC performs well in practice and has been deployed in a number of energy and sustainability-related domains, including control of wind turbines and solar photovoltaics (Lio et al., Reference Lio, Rossiter and Jones2014; Sultana et al., Reference Sultana, Sahoo, Sukchai, Yamuna and Venkatesh2017), smart buildings (Carli et al., Reference Carli, Cavone, Ben Othman and Dotoli2020; Killian & Kozek, Reference Killian and Kozek2016), energy storage (Meng et al., Reference Meng, Dong, Xu and Weller2015; Morstyn et al., Reference Morstyn, Hredzak, Aguilera and Agelidis2018), and electric vehicle charging (Lee et al., Reference Lee, Chang, Jin, Lee, Lee, Lee and Low2018). Moreover, several regional power system operators in the US use MPC to settle the real-time electricity market (Ela & O’Malley, Reference Ela and O’Malley2016), and it is widely understood that such lookahead algorithms will play an increasingly important role in enabling power systems to reliably absorb renewable energy volatility (Hua et al., Reference Hua, Schiro, Zheng, Baldick and Litvinov2019; Zhao et al., Reference Zhao, Zheng and Litvinov2020).
However, MPC suffers computational complexity that grows exponentially with the lookahead horizon if the system model or costs are nonconvex, such as in the case of combined cycle generators (Bjelogrlic, Reference Bjelogrlic2000). In practice, it is thus typically necessary to use heuristic solvers that operate on a faster timescale but generally produce suboptimal decisions (Cao et al., Reference Cao, Lai and Alam2017; Deng et al., Reference Deng, Sun, Li, Lu, Brouwer, Mehta, Zhou and Chakraborty2015; Kelman et al., Reference Kelman, Ma and Borrelli2013). One promising avenue for overcoming the computational complexity of nonconvex MPC to enable improved performance in practice is the development of machine learning (ML) models that imitate its behavior, bypassing the need to solve an independent nonconvex optimization problem to generate each decision. This approach of “learning to control/optimize” has seen significant recent interest in the ML, control, and power systems communities (Fioretto et al., Reference Fioretto, Mak and Van Hentenryck2020; Kotary et al., Reference Kotary, Fioretto and Van Hentenryck2021; Nellikkath & Chatzivasileiadis, Reference Nellikkath and Chatzivasileiadis2022; Pan et al., Reference Pan, Chen, Zhao and Low2022; Pon Kumar et al., Reference Pon Kumar, Tulsyan, Gopaluni and Loewen2018; Spielberg et al., Reference Spielberg, Gopaluni and Loewen2017; Zhang et al., Reference Zhang, Tabas and Zhang2023). While several works develop approaches to guarantee constraint satisfaction for ML controllers (e.g., Zhang et al., Reference Zhang, Tabas and Zhang2023), none of these learning-based approaches come with a priori guarantees on their cost under distribution shift or on out-of-sample problem instances, jeopardizing their performance at deployment time. To counter this potential for poor performance and enable the reliable deployment of learned proxies for MPC in real-world settings, this work proposes an algorithm to robustify the behavior of such an ML proxy. We follow the paradigm of the emerging literature on learning-augmented algorithms (e.g., Lee et al., Reference Lee, Maghakian, Hajiesmaili, Li, Sitaraman and Liu2021; Lykouris and Vassilvtiskii, Reference Lykouris and Vassilvtiskii2018; Purohit et al., Reference Purohit, Svitkina and Kumar2018), specifically building upon the line of work (Antoniadis et al., Reference Antoniadis, Coester, Elias, Polak and Simon2020; Christianson, Handina, et al., Reference Christianson, Handina and Wierman2022; Christianson et al., Reference Christianson, Shen and Wierman2023; Li et al., Reference Li, Yang and Ren2022; Rutten et al., Reference Rutten, Christianson, Mukherjee and Wierman2023) designing algorithms for online optimization with switching costs, a generalization of the dispatch problem, that can exploit the performance of an ML algorithm while providing worst-case guarantees on cost.
1.1. Contributions
Our contributions are twofold. First, we propose a learning-augmented algorithm RobustML for online optimization with switching costs that achieves the best deterministic performance bound in the setting of general nonconvex cost functions. Specifically, when provided with an ML algorithm for the problem as well as a heuristic baseline algorithm, for any desired
$ \unicode{x025B}, \delta >0 $
, our algorithm achieves a cost at most
$ \left(1+\unicode{x025B} +\delta \right) $
times the cost incurred by the ML algorithm while maintaining a worst-case cost bound of
$ \mathcal{O}\left(\frac{\mathrm{C}}{\delta }+\frac{D}{\unicode{x025B}}\right) $
, where
$ \mathrm{C} $
is the cost of the heuristic baseline and
$ D $
is the diameter of the decision space. In other words, RobustML achieves performance that is close to that of ML, yet robust in comparison to the chosen baseline algorithm. This is the best-known tradeoff for deterministic algorithms, as all prior deterministic learning-augmented algorithms with bounded worst-case robustness paid at least three times the cost of the ML algorithm (Antoniadis et al., Reference Antoniadis, Coester, Elias, Polak and Simon2020).Footnote
1 Second, we empirically evaluate the performance of RobustML on a realistic model of a combined cycle cogeneration power plant under increasing penetration of renewable energy. We find that an ML-based approach can improve computation time over MPC by three orders of magnitude, and while the performance of a pure ML approach suffers in response to distribution shift, our algorithm RobustML takes advantage of the good performance of ML in the nominal case (improving over the baseline by 10–15%) while preserving robustness to distribution shift. Moreover, RobustML can achieve better performance than both the pure ML approach and the baseline approach when there is increasing renewable penetration.
Our work has the potential for both direct and downstream impact on the problem of climate change. Our results suggest that using RobustML for real-world grid operation could yield modest but tangible efficiency improvements, leading to reduced emissions. Moreover, RobustML’s robustness guarantees and lookahead use could enable greater penetration of renewables while maintaining grid reliability. More generally, we see great promise in using learning-augmented algorithms like RobustML to achieve efficiency improvements without sacrificing robustness in other energy and sustainability-related domains where MPC is widely used (Carli et al., Reference Carli, Cavone, Ben Othman and Dotoli2020; Killian & Kozek, Reference Killian and Kozek2016; Lee et al., Reference Lee, Chang, Jin, Lee, Lee, Lee and Low2018; Lio et al., Reference Lio, Rossiter and Jones2014; Meng et al., Reference Meng, Dong, Xu and Weller2015; Morstyn et al., Reference Morstyn, Hredzak, Aguilera and Agelidis2018; Sultana et al., Reference Sultana, Sahoo, Sukchai, Yamuna and Venkatesh2017).
2. Model and Preliminaries
We consider the problem of operating a combined cycle cogeneration power plant to meet both electricity and steam demand in the presence of exogenous variable renewable generation; see Figure 1 for a schematic. Specifically, we consider a plant with three gas turbines and a single steam turbine; we index the gas and steam turbines as
$ \left\{\mathrm{1,2,3}\right\} $
and
$ \left\{4\right\} $
, respectively. At each time
$ t\in \left\{1,\dots, T\right\} $
, the plant operator observes an electricity demand (net of renewables)
$ {d}_t\in {\mathrm{\mathbb{R}}}_{+} $
and a steam demand
$ {q}_t\in {\mathrm{\mathbb{R}}}_{+} $
. In response, they choose energy dispatch setpoints
$ {p}_t^{(i)}\in \left[{\underline{p}}_i,{\overline{p}}_i\right] $
and steam dispatch setpoints
$ {s}_t^{(i)}\in \left[{\underline{s}}_i,{\overline{s}}_i\right] $
for all the turbines
$ i=1,\dots, 4 $
, where
$ {\underline{p}}_i,{\overline{p}}_i $
are the lower and upper bounds, respectively, for the energy dispatch, and
$ {\underline{s}}_i,{\overline{s}}_i $
are the lower and upper bounds, respectively, for the steam dispatch. Note that while the gas turbines
$ i=1,2,3 $
produce steam, and thus have positive steam dispatches, the steam turbine
$ i=4 $
consumes steam, so
$ {s}_t^{(4)}<0 $
. The plant operator’s goal is to minimize the total cost of its dispatch decisions and its cost for ramping electricity generation while producing sufficient electricity and steam to meet demand. Formally, the plant operator faces the following constrained minimization problem:



where
$ {\mathbf{p}}_t={\left({p}_t^{(i)}\right)}_{i=1}^4 $
,
$ {\mathbf{s}}_t={\left({s}_t^{(i)}\right)}_{i=1}^4 $
,
$ {f}_t $
is a per-round fuel cost function that may depend on ambient conditions such as temperature, pressure, and humidity, and
$ \alpha $
is a parameter determining the extent to which ramping energy generation is penalized. Importantly, the demands
$ {d}_t,{q}_t $
and the cost functions
$ {f}_t $
are not all known in advance, so the plant operator cannot solve problem (2.1) all at once. Instead, the plant operator only knows the current timestep’s problem parameters
$ {f}_t,{d}_t $
, and
$ {q}_t $
exactly, though they may have (possibly noisy) access to predictions of these parameters in a short lookahead window.

Figure 1. Schematic of the cogeneration power plant model. The plant operator chooses how much electricity (yellow arrow) and steam (blue arrow) the three gas turbines (left cooling tower) produce, as well as how much steam is directed to the steam turbine (right cooling tower) to produce additional electricity. At each time
$ t $
, ambient conditions (e.g., temperature) together with electricity and steam demand are represented by the vector
$ {\boldsymbol{\theta}}_t $
, and electricity and steam dispatch decisions are represented by the vector
$ {\mathbf{x}}_t $
.
The cogeneration dispatch problem we consider can be formulated as a specific instance of the more general problem of online optimization with switching costs (Chen et al., Reference Chen, Comden, Liu, Gandhi and Wierman2016). Abstractly, online optimization with switching costs can be considered as a game in which at each time
$ t\in \left\{1,\dots, T\right\} $
, a decision-maker receives a vector
$ {\boldsymbol{\theta}}_t\in {\mathrm{\mathbb{R}}}^n $
parametrizing a cost function
$ f\left(\cdot; {\boldsymbol{\theta}}_t\right):{\mathrm{\mathbb{R}}}^d\to {\mathrm{\mathbb{R}}}_{+} $
. In response, the decision-maker must choose a decision
$ {\mathbf{x}}_t\in {\mathrm{\mathbb{R}}}^d $
and incurs the hitting cost
$ f\left({\mathbf{x}}_t;{\boldsymbol{\theta}}_t\right) $
as well as the switching cost
$ \parallel {\mathbf{x}}_t-{\mathbf{x}}_{t-1}\parallel $
resulting from that decision, where
$ \parallel \cdot \parallel $
is some norm. We assume that the decision
$ {\mathbf{x}}_t $
does not impact future parameters
$ {\boldsymbol{\theta}}_{\tau } $
for
$ \tau >t $
. In the context of the cogeneration dispatch application,
$ {\boldsymbol{\theta}}_t $
is a vector containing all ambient conditions such as temperature, pressure, humidity, and the electricity and steam demands
$ {d}_t,{q}_t $
at time
$ t $
,
$ {\mathbf{x}}_t $
is a vector including the plant operator’s energy and steam dispatch decisions at time
$ t $
, and the function
$ f $
maps ambient conditions and generator dispatches to a fuel cost while penalizing violation of constraints including capacity limits and supply–demand balance. The switching term
$ \parallel {\mathbf{x}}_t-{\mathbf{x}}_{t-1}\parallel $
represents the ramping cost.Footnote
2 The problem is online, meaning that the decision-maker only has access to the parameters
$ {\boldsymbol{\theta}}_1,\dots, {\boldsymbol{\theta}}_t $
that have been revealed by time
$ t $
when making the decision
$ {\mathbf{x}}_t $
. However, the decision-maker may have access to (possibly inaccurate) forecasts
$ {\hat{\boldsymbol{\theta}}}_{t+1\mid t},\dots, {\hat{\boldsymbol{\theta}}}_{t+w\mid t} $
of parameters within a lookahead window of length
$ w\in \mathrm{\mathbb{N}} $
, which can help them anticipate and reduce future switching costs. Such forecasts could be obtained using standard ML methods for predicting near-term weather or energy demand.
We consider two standard algorithms for the problem of online optimization with switching costs. The first, Greedy, is a myopic algorithm that simply chooses the decision
$ {\mathbf{x}}_t $
that minimizes
$ f\left(\cdot; {\boldsymbol{\theta}}_t\right) $
at each time
$ t $
. This algorithm has worst-case cost guarantees under mild assumptions on the structure of the cost function
$ f $
(Zhang et al., Reference Zhang, Jiang, Lu and Yang2021), and it resembles the single-stage dispatch algorithm widely used by power system operators. Its behavior is characterized formally as follows:

That is, Greedy can be viewed as a function that, when provided with parameter vector
$ {\boldsymbol{\theta}}_t\in {\mathrm{\mathbb{R}}}^n $
, returns the minimizer of
$ f\left(\cdot; {\boldsymbol{\theta}}_t\right) $
as a dispatch decision. Note that, since we assume that
$ f\left(\cdot; {\boldsymbol{\theta}}_t\right) $
penalizes all constraints from the original cogeneration problem formulation, Greedy will necessarily satisfy these constraints.
The second algorithm we consider is model predictive control (MPC). MPC solves a lookahead optimization problem using near-term predictions of parameters to choose a decision. Formally, at time
$ t $
, given the perfectly known current parameter vector
$ {\boldsymbol{\theta}}_t $
and forecasts
$ {\hat{\boldsymbol{\theta}}}_{t+1\mid t},\dots, {\hat{\boldsymbol{\theta}}}_{t+w\mid t} $
of parameters over the next
$ w $
timesteps, MPC chooses its decision as the optimal solution of the following optimization problem:

where
$ {\mathbf{y}}_0:=\mathbf{x} $
and we have concatenated the parameter vectors
$ \left({\boldsymbol{\theta}}_t,{\hat{\boldsymbol{\theta}}}_{t+1\mid t},\dots, {\hat{\boldsymbol{\theta}}}_{t+w\mid t}\right) $
into a single entity
$ \Theta \in {\left({\mathrm{\mathbb{R}}}^n\right)}^{w+1} $
for brevity. Note that only the minimizer
$ \mathbf{x} $
corresponding to the decision made for time
$ t $
is binding, i.e., the chosen decision
$ {\mathbf{x}}_t $
is simply the optimal
$ \mathbf{x} $
in the above optimization; all of the other variables
$ {\mathbf{y}}_1,\dots, {\mathbf{y}}_w $
are ignored after the solution is obtained. Note that, similar to Greedy, MPC produces decisions that satisfy the constraints of the original cogeneration problem.
As previously discussed, MPC can be computationally prohibitive if
$ f\left(\cdot; \boldsymbol{\theta} \right) $
is nonconvex. However, there has been a great deal of recent work on learning proxies for power systems-related constrained optimization problems that operate more quickly than traditional optimization solvers (Donti et al., Reference Donti, Rolnick and Kolter2020; Park & Van Hentenryck, Reference Park and Van Hentenryck2023; Zhang et al., Reference Zhang, Tabas and Zhang2023). Following this approach, in our work, we train an ML model to approximate the input–output behavior of MPC in an unsupervised fashion. That is, given a dataset
$ \mathcal{D}={\left\{{\Theta}_i\right\}}_{i=1}^N $
of cost function parameters, we train a neural network
$ \mathrm{ML}:{\left({\mathrm{\mathbb{R}}}^n\right)}^{w+1}\to {\left({\mathrm{\mathbb{R}}}^d\right)}^{w+1} $
to minimize MPC’s lookahead objective (2.4) on average over this dataset. We employ the method of Zhang et al. (Reference Zhang, Tabas and Zhang2023) and include a gauge map in the output layer of the network to ensure that the network always produces decisions that satisfy the capacity limits and supply–demand balance constraints for both energy (2.2) and steam (2.3).
While this unsupervised training approach may yield low empirical error on the training set and the gauge map ensures constraint satisfaction, this does not guarantee that the ML algorithm will perform well on out-of-sample instances or under distribution shift. In the next section, we introduce an algorithm to address this challenge and provide provable cost guarantees for the learned proxy ML.
We conclude this section with some notation. For an algorithm Alg producing decisions
$ {\mathbf{x}}_1,\dots, {\mathbf{x}}_T $
, define
$ {\mathrm{ALG}}_t:={\mathbf{x}}_t $
as Alg’s decision at time
$ t $
, and define
$ {\mathrm{C}}_{\mathrm{ALG}}\left(s,t\right):={\sum}_{\tau =s}^tf\left({\mathbf{x}}_{\tau };{\boldsymbol{\theta}}_{\tau}\right)+\parallel {\mathbf{x}}_{\tau }-{\mathbf{x}}_{\tau -1}\parallel $
as Alg’s cost from time
$ s $
through
$ t $
. For brevity, we write the total cost as
$ {\mathrm{C}}_{\mathrm{ALG}}:={\mathrm{C}}_{\mathrm{ALG}}\left(1,T\right) $
.
3. Algorithm
We propose in Algorithm 1 a novel algorithm, RobustML, that “robustifies” the algorithm ML, giving a sequence of dispatch decisions with cost that can be bounded in terms of the costs incurred by both ML and Greedy. RobustML behaves as follows: it begins by following ML’s decisions, but if ML surpasses a cost threshold and Greedy is performing relatively well, RobustML will switch to following Greedy’s decisions (line 8). However, if Greedy begins to perform worse relative to ML, then RobustML will switch back to following ML (line 15). The specific thresholds for switching are determined by the parameters
$ \unicode{x025B}, \delta >0 $
, which reflect the algorithm’s confidence in ML. That is, when
$ \unicode{x025B} $
and
$ \delta $
are small (close to
$ 0 $
), RobustML will spend more time following the decisions of ML; on the other hand, when
$ \unicode{x025B} $
and
$ \delta $
are large, RobustML will spend more of its time following the decisions of Greedy. Tuning of
$ \unicode{x025B} $
and
$ \delta $
thus enables the plant operator to choose the particular tradeoff they wish to achieve between exploiting the (typically excellent) performance of ML and obtaining worst-case performance guarantees comparable to that of Greedy, even in cases of distribution shift.
Algorithm 1: The algorithm
$ \mathrm{R}\mathrm{OBUST}\mathrm{ML}\left(\unicode{x025B}, \delta \right) $
. Note that all algorithms are assumed to begin in the same initial state
$ {\mathbf{x}}_0 $
, so
$ {\mathbf{a}}_0={\mathbf{r}}_0={\mathbf{x}}_0 $
(where
$ {\mathbf{a}}_0={\mathrm{ML}}_0 $
and
$ {\mathbf{r}}_0={\mathrm{GREEDY}}_0 $
).
Our main analytic result on the performance of RobustML is the following bound on its cost. Note that we assume that the decision space has diameter
$ D $
, so
$ \parallel {\mathrm{ML}}_t-{\mathrm{GREEDY}}_t\parallel \le D $
for all
$ t\in \left\{1,\dots, T\right\} $
.
Theorem 3.1. The algorithm RobustML (Algorithm 1) achieves cost bounded as

We present a proof of Theorem 3.1 in Section 6. In particular, the theorem tells us that by selecting
$ \epsilon, \delta $
arbitrarily small, RobustML can achieve performance arbitrarily close to that achieved by ML, at the cost of possibly worse performance relative to Greedy. However, by selecting moderate
$ \epsilon, \delta $
, it is possible to trade off the exploitation of ML with robustness with respect to Greedy’s performance. Moreover, rather than selecting fixed
$ \epsilon, \delta $
a priori to achieve a particular, fixed tradeoff, it is possible to apply techniques from online learning to optimally tune these parameters in an adaptive fashion (see, e.g., Khodak et al., Reference Khodak, Balcan, Talwalkar and Vassilvitskii2022; Li et al., Reference Li, Yang, Qu, Shi, Yu, Wierman and Low2022; Zeynali et al., Reference Zeynali, Sun, Hajiesmaili and Wierman2021).
It is also important to note that the above result does not depend in any crucial way on the choice of the algorithm Greedy. That is, the plant operator can choose any strategy as their baseline “robust” algorithm, including heuristic methods that are currently employed in practice, and the performance bound given by Theorem 3.1 on Algorithm 1 will hold, with
$ {\mathrm{C}}_{\mathrm{GREEDY}} $
replaced by the cost of the operator’s chosen strategy. Thus, Algorithm 1 gives a method for power plant operators to take advantage of machine learning’s good performance while maintaining worst-case guarantees relative to their existing strategies, whatever they may be.
4. Experimental Results and Discussion
We investigate the performance of our algorithm for cogeneration dispatch on a modified form of the CogenEnv environment from SustainGym, a reinforcement learning benchmarks suite (Yeh et al., Reference Yeh, Li, Datta, Arroyo, Christianson, Zhang, Chen, Hosseini, Golmohammadi, Shi, Yue and Wierman2023); all of the code, models, and data employed in conducting experiments can be found at https://doi.org/10.5281/zenodo.13623809. This cogeneration plant has behavior modeled in a black-box fashion via neural networks, and thus the fuel cost is differentiable as a function of the dispatch decisions. We train an ML model to replicate the behavior of MPC with lookahead
$ w=6 $
in an unsupervised fashion as described in the previous section. The model is trained on a dataset of 100 days of ambient conditions (temperature, pressure, humidity) and municipal demands for energy and steam on a 15 minute basis. To solve for the Greedy and MPC decisions, we use sequential least-squares programming in SciPy (Virtanen et al., Reference Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett, Wilson, Millman, Mayorov, Nelson, Jones, Kern and Larson2020).
We begin by comparing the runtimes of Greedy, MPC, and ML to produce a day’s worth of dispatch decisions on an AWS EC2 instance with 4 virtual CPU cores and an NVIDIA T4 GPU (Figure 2). We find that while Greedy can produce a day of dispatches in about 30 seconds, MPC with lookahead
$ w=6 $
takes nearly 100 times as long (
$ \sim $
45 minutes). On the other hand, ML is three orders of magnitude faster than MPC, returning a day’s worth of decisions in under two seconds on average. This is unsurprising, since while Greedy and MPC require solving optimization problems to produce dispatch decisions, ML simply requires a neural network’s forward pass.

Figure 2. Number of seconds (mean and standard deviation) for each algorithm to produce a day’s worth of dispatch decisions.
We next examine the performance of ML, Greedy, and RobustML on the baseline system (no renewables) when the lookahead predictions are unreliable, as in the case when there is distribution shift. That is, we compare the setting of perfect predictions (i.e.,
$ {\hat{\boldsymbol{\theta}}}_{t+1\mid t}={\boldsymbol{\theta}}_{t+1},\dots, {\hat{\boldsymbol{\theta}}}_{t+w\mid t}={\boldsymbol{\theta}}_{t+w} $
) to settings with i.i.d. Gaussian noise and increasing standard deviation
$ \sigma $
on the predictions (i.e.,
$ {\hat{\boldsymbol{\theta}}}_{t+1\mid t}={\boldsymbol{\theta}}_{t+1}+{\mathbf{z}}_{t+1\mid t},\dots, {\hat{\boldsymbol{\theta}}}_{t+w\mid t}={\boldsymbol{\theta}}_{t+w}+{\mathbf{z}}_{t+w\mid t} $
, with
$ {\mathbf{z}}_{\tau \mid t}\overset{\mathrm{i}.\mathrm{i}.\mathrm{d}.}{\sim}\mathcal{N}\left(\mathbf{0},\sigma \mathbf{I}\right) $
). We show the results in Figure 3 for several choices of the RobustML parameters
$ \unicode{x025B}, \delta $
. In particular, we observe that while ML performs better than Greedy when predictions are good (
$ \sigma \le 30 $
), its performance degrades significantly as the noise grows (
$ \sigma \to 100 $
). Nonetheless, for the cases of larger
$ \unicode{x025B} $
and
$ \delta $
– specifically,
$ \unicode{x025B} =\delta =0.7 $
and
$ \unicode{x025B} =\delta =1.0 $
– RobustML gracefully transitions between the good performance of ML for small
$ \sigma $
to nearly matching or beating the performance of Greedy in the large
$ \sigma $
regime. Thus, even though the quality of predictions is unknown a priori, RobustML exploits the good performance of ML when predictions are good while preserving robustness in the worst case. Notably, however, this is not the case when
$ \unicode{x025B} =\delta $
is small, as can be seen in the top row of Figure 3; in this case, RobustML stays too close to the ML algorithm, and its performance suffers as a result.

Figure 3. Cost of each algorithm under increasing noise
$ \sigma $
on the lookahead predictions, normalized by Greedy’s cost. Curves indicate mean cost and shaded regions cover
$ \pm $
one standard deviation. Several choices of parameters are shown for RobustML:
$ \unicode{x025B} =\delta =0.1 $
(top left),
$ \unicode{x025B} =\delta =0.4 $
(top right),
$ \unicode{x025B} =\delta =0.7 $
(bottom left),
$ \unicode{x025B} =\delta =1.0 $
(bottom right).
We further examine the performance of the algorithms under increasing penetration of wind energy, displaying the results in Figure 4. We find that the efficiency improvement of ML over Greedy widens for moderate wind penetration (up to 200 MW), highlighting the value of using lookahead alongside ML to increase efficiency when renewable generation leads to more frequent ramp events. Notably, however, ML’s performance gets closer to that of Greedy when wind penetration nears 400 MW, possibly reflecting the challenge of adequately anticipating future ramp needs in a high-renewables environment even when equipped with moderate lookahead. Nonetheless, we find that when
$ \unicode{x025B} =\delta =1.0 $
, RobustML yields uniformly improved performance over both ML and Greedy and thus enables better efficiency even in challenging high-renewables scenarios. While surprising, this improvement highlights that adaptive switching between the ML and Greedy algorithms enables RobustML to exploit each algorithm when it performs well, leading to better average performance than either algorithm.

Figure 4. Cost of each algorithm under increasing wind penetration, normalized by Greedy’s cost. Curves indicate mean cost and shaded regions cover
$ \pm $
one standard deviation. Several choices of parameters are shown for RobustML:
$ \unicode{x025B} =\delta =0.1 $
(top left),
$ \unicode{x025B} =\delta =0.4 $
(top right),
$ \unicode{x025B} =\delta =0.7 $
(bottom left),
$ \unicode{x025B} =\delta =1.0 $
(bottom right).
5. Conclusions
In this work, we proposed a learning-augmented algorithm, RobustML, for the operation of cogeneration plants with ramp costs, finding that it effectively balances exploitation of the good performance of machine-learned algorithms with worst-case robustness in cases of distribution shift. In particular, RobustML allows for taking advantage of lookahead predictions without suffering the computational complexity of MPC, and depending on how its parameters are tuned, it can yield performance that uniformly improves on both a pure ML approach and existing baseline algorithms used in practice.
Several interesting avenues remain open for future work. First, it would be interesting to apply RobustML to larger energy systems with more generators to examine how performance gains scale on larger systems. Second, while RobustML currently accounts for ramping costs in its performance guarantee, it does not ensure the satisfaction of strict ramp limits, which constrain the operation of many dispatchable generators in the real world. As such, extending the algorithm to allow for such strict limits would be a valuable step toward real-world algorithm deployment. Finally, while in this work, we trained the ML model to mimic MPC, there are numerous variants of MPC that are explicitly designed for greater robustness, e.g., Bemporad and Morari (Reference Bemporad, Morari, Garulli and Tesi1999), Christianson, Werner, et al. (Reference Christianson, Werner, Wierman and Low2022), Langson et al., (Reference Langson, Chryssochoos, Raković and Mayne2004), and Schwarm and Nikolaou (Reference Schwarm and Nikolaou1999). It would be interesting to examine how such robustified MPC variants perform when input to a learning-augmented algorithm such as RobustML and whether they can improve both average and worst-case performance.
6. Proof of Theorem 3.1
We begin by showing
$ {\mathrm{C}}_{\mathrm{ROBUSTML}}\le \left(1+\unicode{x025B} +\delta \right){\mathrm{C}}_{\mathrm{ML}} $
. Note that the algorithm consists of phases in which RobustML first coincides with ML, and then switches to following Greedy, before switching back to ML, and so on. We will assume that RobustML ends the instance coinciding with ML, so
$ {\mathbf{x}}_T={\mathbf{a}}_T $
; the case in which RobustML ends at
$ {\mathbf{r}}_T $
is similar. Let
$ {t}_i $
denote the timestep in which RobustML switches from Greedy back to ML for the
$ i $
th time, with
$ {t}_0:=1 $
since RobustML always begins by following ML. Similarly, let
$ {m}_i $
denote the timestep in which RobustML switches from ML to Greedy for the
$ i $
th time. Clearly, we have
$ 1={t}_0<{m}_1<{t}_1<\cdots <{m}_k<{t}_k\le T $
, for some
$ k\in \mathrm{\mathbb{N}} $
. Even though RobustML ends at ML, define
$ {m}_{k+1}:=T+1 $
for notational simplicity. Then the cost of RobustML may be written as




where (6.1) follows from the triangle equality on
$ \parallel {\mathbf{r}}_{m_i}-{\mathbf{a}}_{m_i-1}\parallel $
and
$ \parallel {\mathbf{a}}_{t_i}-{\mathbf{r}}_{t_i-1}\parallel $
, and (6.2) follows by the diameter bound. The inequality (6.3) follows by line 6 of the algorithm, which states that the algorithm will switch from following ML to following Greedy at time
$ t $
only if
$ {\mathrm{C}}_{\mathrm{ML}}\left(s,t\right)\ge \frac{2D}{\unicode{x025B}} $
. Noting that at the start of any timestep
$ t $
,
$ s $
is exactly

(with
$ {m}_0:=0 $
for notational convenience), it follows that for each
$ i\in \left[k\right] $
,
$ {\mathrm{C}}_{\mathrm{ML}}\left({m}_{i-1}+1,{m}_i\right)\ge \frac{2D}{\unicode{x025B}} $
. Thus

Finally, (6.4) follows from

since by definition,
$ {\mathbf{x}}_{t_k-1}={\mathbf{r}}_{t_k-1} $
, which by line 12 of the algorithm means that
$ {\mathrm{C}}_{\mathrm{GREEDY}}\left(1,{t}_k-1\right)<\delta \cdot {\mathrm{C}}_{\mathrm{ML}}\left(1,{t}_k-1\right) $
. Thus, we have proved the desired bound
$ {\mathrm{C}}_{\mathrm{ROBUSTML}}\le \left(1+\unicode{x025B} +\delta \right){\mathrm{C}}_{\mathrm{ML}} $
.
We now turn to showing
$ {\mathrm{C}}_{\mathrm{ROBUSTML}}\le \left(1+\frac{1+\unicode{x025B}}{\delta}\right){\mathrm{C}}_{\mathrm{GREEDY}}+\left(1+\frac{2}{\unicode{x025B}}\right)D $
. First suppose RobustML finishes the instance coinciding with ML, so
$ {\mathbf{x}}_T={\mathbf{a}}_T $
. Let
$ \tau \in \left\{0,\dots, T-1\right\} $
denote the last time at which RobustML coincided with Greedy, or that
$ {\mathbf{x}}_{\tau }={\mathbf{r}}_{\tau } $
. Thus the cost can be bounded as



where (6.5) follows via the previously proved inequality
$ {\mathrm{C}}_{\mathrm{ROBUSTML}}\le \left(1+\unicode{x025B} +\delta \right){\mathrm{C}}_{\mathrm{ML}} $
, and (6.6) follows by the fact (according to line 14 of the algorithm) that RobustML switching from Greedy to ML at time
$ \tau +1 $
means that
$ {\mathrm{C}}_{\mathrm{GREEDY}}\ge \delta \cdot {\mathrm{C}}_{\mathrm{ML}}\left(1,\tau +1\right) $
, as well as from the following observation: since RobustML coincides with ML between times
$ \tau +1 $
and
$ T $
, line 6 of the algorithm tells us that either
$ {\mathrm{C}}_{\mathrm{ML}}\left(\tau +2,T\right)<\frac{2D}{\unicode{x025B}} $
or
$ {\mathrm{C}}_{\mathrm{GREEDY}}\ge \delta \cdot {\mathrm{C}}_{\mathrm{ML}} $
.
Finally, suppose RobustML finishes the instance coinciding with Greedy, so
$ {\mathbf{x}}_T={\mathbf{r}}_T $
. Let
$ \sigma \in \left\{0,\dots, T-1\right\} $
denote the last time at which RobustML coincided with ML, or that
$ {\mathbf{x}}_{\sigma }={\mathbf{a}}_{\sigma } $
. By the previous case’s inequality (6.7), we have

Acknowledgments
The authors thank James Chen for helpful discussions.
Author contribution
Conceptualization: all authors; Methodology: N.C., C.Y.; Data curation: N.C., M.H., M.T.R., A.G.; Formal analysis: N.C.; Data visualization: N.C.; Writing—original draft: N.C.; Writing—review and editing: all authors.
Competing interest
Competing interests: during the completion of this work, M.H., M.T.R., and A.G. are or were employed at Beyond Limits, which funded a portion of this work.
Data availability statement
The code, models, and data used in conducting the experiments presented in this work are publicly available at https://doi.org/10.5281/zenodo.13623809.
Ethical standard
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Funding statement
This research was supported by NSF grants CNS-2146814, CPS-2136197, CNS-2106403, and NGSDI-2105648, Amazon AWS, the Caltech Resnick Sustainability Institute, and Beyond Limits. N.C. was supported by an NSF Graduate Research Fellowship (DGE-1745301). Tongxin Li was supported by the start-up funding UDF01002773 of CUHK-Shenzhen. The NSF, Amazon AWS, and the Caltech Resnick Sustainability Institute had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Beyond Limits had a role in the study design, data collection, and preparation of the final manuscript.