1 Introduction
Modelling the transition to sustainable energy systems poses several challenges in Mathematical Optimization, especially in the long term. For an electricity system to deliver mostly renewable energy by 2050, the capacity planning model must take into account several sources of uncertainty. With the progressive decommissioning of carbon and gas-fueled power, wind and solar energy becomes fundamental for achieving the 2050 goal. Such technologies are intermittent but not storable, a feature that makes more risky the investment in renewable capacity expansion. In countries with hydrogeneration, like Brazil and Norway, water reservoirs serve as storage to transfer energy from/to periods with high/low availability of renewable sources. However, the intermittency of wind and solar power puts under stress the hydro-reservoirs. As explained in luna-sagastizabal-silva-2021, Brazil’s storage capacity decreased more than 10% in the past decade, making the system less flexible and more vulnerable to uncertainty.
When designing capacity expansion models, taking into account the considerations above is crucial. Stochastic Programming models are not suitable to provide sound decision-making because uncertain future market conditions (supply and demand for electricity, generation and investment costs) are only indirectly observable, for example through samples. If the probability values used in the model are incorrect, the stochastic program can give suboptimal results and lead to wrong investment decisions (in a risky environment!). To address this issue, the more recent area of Distributionally Robust Stochastic Optimization (DRSO) considers the probability as an additional decision variable, to be chosen among many distributions in a certain ambiguity set, see for instance the works by [Reference Pflüg and Wozabal24] and [Reference Wozabal33].
A second major concern for the decision maker is how to deal with random realisations that are influenced by the decisions. For capacity expansion problems, such is the case when the Government subsidises the cost of equipment related to wind or solar power (a large future investment on renewables could result in the Governement lowering its subsidized fraction). Stochastic Programming models with decision-dependent uncertainty were considered by [Reference Jonsbråten, Wets and Woodruff16,Reference Goel and Grossmann11] and, more recently, [Reference Hellemo, Barton and Tomasgard15]. Robust formulations of problems with decision-dependent uncertainty are shown to be NP-complete by [Reference Nohadani and Sharma21]. Both [Reference Goh and Sim12] and [Reference Postek, den Hertog and Melenberg26] confirm that the well-known conservatism of robust optimisation approaches is mitigated thanks to the consideration of ambiguity sets.
With a DRSO model, endogenous probability distributions imply a variation of the ambiguity set with the decision variable. This is the model explored in this work that we focus on decision-dependent distributionally robust two-stage stochastic optimization (ddDR2SO). A ddDR2SO problem outputs the optimal value for the probability distribution together with the optimal decisions. The connection between the endogenous probability and the decision variable can be explicitly given through constraints in the ambiguity set, or indirectly, by the fact that the optimization is performed jointly on both type of variables; see (2.4) below.
Regarding ambiguity sets, several different specifications have been considered in the literature. One possibility is to look for probability distributions whose first- and second-order moments are close to those of some exogenous empirical estimation. To cite a couple of works among many others, this is the approach in [Reference Bertsimas, Doan, Natarajan and Teo4] and [Reference Delage and Ye7], where the wording ‘distributionally robust’ was first coined. Another line of research considers probability distributions that are close in some metric to a nominal probability, taken as a reference. Typical measures of closeness of measures are the phi-divergence distance and the Wasserstein balls, respectively, analysed by [Reference Ben-Tal and Teboulle3] and [Reference Pflüg and Wozabal24]. Phi-divergences, introduced in [Reference Calafiore6] for ambiguity sets, are employed in [Reference Wang, Glynn and Ye31] to include distributions that make observed historical data achieve a certain level of likelihood. Bayraksan and Love [Reference Bayraksan and Love1] extended the concept to two-stage stochastic linear programs.
Usually, DRSO is considered as being between Stochastic Programming and Robust Optimization. In the spectrum of possible choices for the probability distribution that enters the optimization problem, the modelling paradigms of Stochastic Programming and DRSO could be seen as positioned at opposite extremes with respect to a certain distance to a nominal probability distribution. The latter takes the worst case over the ambiguity set of probabilities, while with the former the choice of ambiguity set shrinks to a singleton (the nominal probability). From the point of view of decision-making, this amounts to adopting either a fully optimistic or a somehow pessimistic view of how well the probability distribution fits the random nature of the world.
We propose an in-between paradigm that is both optimistic and pessimistic to certain degrees. For ambiguous two-stage stochastic programs, our ddDR2SO model defines a robustified expected recourse function using probabilities in a Wasserstein ball. The novelty is that, instead of taking a nominal probability, the ball center is considered variable. This additional variable is minimized in the first stage over a simple convex set, for example the convex hull of several nominal probabilities taken as a reference.
When considering discrete distributions, with a finite number of scenarios, the new model can be reformulated as a bilinear programming problem amenable to the solver BARON, for instance; see Lemma 3.1 below. The structure is suitable for decomposition: bilinearity appears only in the objective function, and the feasible set has as many convex nonsmooth constraints as scenarios in the problem. Indeed, as stated opportunely, the convex nonsmooth recourse functions of the original two-stage problem appear in the constraint set of our reformulation. Similarly to the L-shaped method by [Reference Van Slyke and Wets30], the solution of scenario subproblems provides cuts that either linearize the recourse function or cut-off infeasible first-stage points. The algorithm then proceeds by incorporating feasibility and optimality cuts into a master program that still has a bilinear objective function, but now a polyhedral feasible set. If the solution method employed to solve the master programs can find global solutions, the procedure converges to a global solution. In our case study, this is ensured by using the solver BARON, by [Reference Tawarmalani and Sahinidis28].
The interest of the decomposition in a ddDR2SO framework is that, instead of solving one large, difficult, problem with many bilinear terms (and a nonsmooth feasible set), each iteration solves a much easier problem, having less bilinear terms, less variables, and a polyhedral feasible set. These features are crucial for computational efficiency. The methodology is illustrated on a case study for planning investments in energy generation, under uncertain market conditions. The simplified model considers the whole of Europe over the horizon 2020–2050, taking into account the progressive decommissioning of thermal power plants and the increasing proportions of renewable technologies that are foreseeable for the power mix. Ambiguity sets are particularly suited, because the long-term horizon complicates vastly the assignment of a unique probability to an uncertain event. Our approach gives an indication if the investment return is stable relative to parameters of the problem.
The work is organised as follows. The new paradigm, which deals with ambiguity from a perspective placed between the optimistic and the pessimistic models, is given in Section 2. Tractability for Wasserstein balls, shown in Section 3, is exploited in Section 4 to derive the decomposition method for ddDR2SO two-stage linear programs. A thorough numerical assessment, validating the approach for the energy investment problem, is performed in Section 5. The paper ends with some concluding comments and remarks.
2 Partly optimistic and partly pessimistic settings
Consider a vector of decision variables $x \in \mathbb{R}^n$ with feasible set $\mathcal{X} \subset \mathbb{R}^n$ . Uncertain parameters are modelled by a measurable mapping $\xi:\Omega\to\Xi\subset\mathbb{R}^m$ and, given a $\sigma$ -algebra $\mathcal F$ , the sample space is $(\Omega,\mathcal{F})$ . Given a function $C_x:\mathbb{R}^n\to\mathbb{R}$ and a cost-to-go mapping $\mathfrak{Q}:\mathbb{R}^n\times\Xi\to\mathbb{R}\cup\{\pm\infty\}$ , the two-stage problem attached to the probability distribution p is
In the objective function, we make explicit the deterministic cost $C_x(x)$ to exploit two-stage structural properties favorable for decomposition methods, like those in [Reference Sagastizábal27], see also [Reference Love and Bayraksan18].
In order to formalize our approach, a nonempty set of probability distributions $\mathcal{P}$ is defined on the set $\mathcal{M}$ of measures on the sample space. If selecting the distribution $p \in \mathcal{P}$ , among all the possible probability distributions, entails a cost $C_p:\mathcal{M}\to\mathbb{R}$ , our first model of distributionally ambiguous problem has the form
The cost for selecting a probability distribution can be used in diverse ways. For instance, it can be a Euclidean distance to a nominal probability or it can be a measure of entropy. It can also be used to make the model avoid selecting only probability distributions that assign higher likehoods only to the best scenarios. The format (2.2) also encompasses the robust optimisation paradigm by [Reference Ben-Tal, Ghaoui and Nemirovski2], taking as ambiguity set the full space of measures. Continuing with connections with other models, notice that the DRSO problem (2.2) resolves the ambiguity present in the probability distribution by adopting a pessimistic view, as in [Reference Wiesemann, Tsoukalas, Kleniati and Rustem32] and [Reference Kuhn, Esfahani, Nguyen and Shafieezadeh-Abadeh17, Section 2.1]. Specifically, introducing the worst-case risk functional,
gives the following problem, equivalent to (2.2):
Another option is to choose the most favorable output, plugging in the optimisation problem the risk functional
that [Reference Dempe, Dutta and Mordukhovich8] called optimistic solution in bilevel programming.
Note that $\Theta_{\textrm{pess}} (x)$ and $\Theta_{\textrm{opt}} (x)$ are associated to two different bilevel problems expressing, respectively, a pessimistic and an optimistic view on risk aversion. They are not related with the pessimistic and optimistic solutions of the same bilevel problem.
Our proposal, set amid these two alternatives, is to consider problems of the form
where $\mathbb{B}_\kappa{{(p)}} \subset \mathcal{M}$ is a ball about p of radius $\kappa$ . Tractability of the new approach depends on the choice of these balls, an issue considered in details in Section 3. For now we just mention that with the Wasserstein metric the balls are defined by a system of affine inequalities, a convenient feature when it comes to computational implementation. In particular, proceeding in a manner similar to the presentation by [Reference Noyan, Rudolf and Lejeune22], we use duality arguments and the Wasserstein balls, in Lemma 3.1, to rewrite the supremum defining the risk functional as a minimum, therefore avoiding certain technical issues that arise with the pessimistic formulation.
Note that there are two ambiguity sets, namely, $\mathcal{P}$ and $\mathbb{B}_\kappa(p)$ . Since in DRSO ambiguity sets are designed by the user (so that they are not too complex, but still provide good out-of-sample performances), one might ask whether a formulation with two ambiguity sets does not require too much information from the user. Fortunately, this is not the case in our setting because the second ambiguity set has a rather specific structure: the effect of ambiguity about p can be handled explicitly in the optimisation problem (see Lemma 3.1). Thanks to this feature, our formulation amounts to a problem like (2.3), with only one ‘optimistic’ ambiguity set $\mathcal{P}$ , and additional terms that represent the impact of $\mathbb{B}_\kappa(p)$ .
Since in (2), letting $\mathbb{B}_\kappa{{(p)}}=\lbrace p \rbrace$ (intuitively, $\kappa=0$ ) and $\mathbb{B}_\kappa{{(p)}}=\mathcal{P}$ (intuitively, $\kappa=\infty$ ), respectively, gives the optimistic and pessimistic approaches, problem (2.3) can be thought of being cast in a robustified optimistic setting. The parameter $\kappa$ , called the robustification ratio, determines to which extent the risk functional in (2.3) bends towards an optimistic or a pessimistic view. The resulting problem takes into account some ambiguity, while hedging against estimation errors on the probability values. Additionally, because the optimization process outputs the ‘nominal’ probability p together with the decision variable x, our model (2.3) is also suitable for problems with decision-dependent probabilities.
When comparing optimal values, our formulation lies between the optimistic and pessimistic models. To make this clear, observe that $\Theta_\kappa(x) \leq \Theta_{\kappa + \eta}(x)$ for all $\kappa, \eta \geq 0$ . Therefore, the inequality
holds. It follows that for $\kappa = 0$ the optimal value of (2.3) is optimistic and for $\kappa = \infty$ , the optimal value is pessimistic. In this sense, our approach is between the optimistic and the pessimistic.
When the sampling space is discrete, say given by scenarios in the set $\lfloor {S}\rfloor:=\left\{1,\ldots,S\right\}$ , only discrete measures are considered and $\mathcal{P}\subset\mathbb{R}^S$ is assumed to be a bounded set. In this context, problem (2.3) writes down as follows:
where we use the shorter notation $\mathfrak{Q}_s(\cdot):=\mathfrak{Q}(\cdot,\xi_s(\omega))$ for the recourse functions.
When the ambiguity set $\mathbb{B}_\kappa{{(p)}}$ is a Wasserstein-type ball of nearby probabilities, the resulting structure is exploited in Lemma 3.1 to derive a dualization formula, inspired from [Reference Noyan, Rudolf and Lejeune22]. As demonstrated in Section 4, this yields an explicit expression that makes our ddDR2SO problem tractable. To this end, it is convenient to consider separately the q-decision variable in the maximum and write the problem in the following form:
In problem (2.5), the probability $p \in \mathcal{P}$ is a decision variable. This modelling device appears useful in settings like the one considered in our numerical experience. For cases where assigning a distribution to uncertainty is not straightforward, basing decisions over a set of probabilities provides hedging. Thanks to the second ambiguity set, $\mathbb{B}_\kappa(p)$ , the expected value in the objective of (2.5) is robustified (hedged) against estimation errors on $\mathcal{P}$ . The set $\mathcal{P}$ may be chosen as a vicinity to some empirical distribution or by other means. In our case, since we consider a long-term investment problem for 30 years into the future, the probabilities would essentially be chosen by specialists, or even by trial and error, to evaluate possible investment decisions of the model. For very long horizons, the decision maker might be actually interested in assessing the best and worst costs induced by choices of probabilities, in which case, the systematic calibration of $\kappa$ is useful. Finally, our reformulation to (2.3) makes it clear that the impact of changes in $\kappa$ cannot be reproduced using only one ambiguity set $\mathcal{P}$ .
3 Wasserstein balls and likelihood robustification
Robustification operations are based on defining vicinities for the quantities being robustified, and then taking the supremum over that vicinity. The underlying quantity can be an expectation or a variance. The idea can be applied in general with varying degrees of practical computational use, while the procedure depends on the specific form of the balls considered in (2.5). Each specific form of vicinity gives rise to a different robustification strategy. Tractability of ambiguous problems with Wasserstein balls is discussed in [Reference Postek, den Hertog and Melenberg26] for discrete measures and in [Reference Hanasusanto and Kuhn14] for continuous distributions. The latter work, in particular, derives a copositive reformulation for two-stage robust and DRSO linear programs from $\ell_2$ -Wasserstein balls. The same authors show that when the ambiguity set is centered at a discrete distribution and there are no support constraints, the $\ell_1$ -Wasserstein ball gives linear programming problems. More recently, complexity bounds for two-stage DRSO problems with the $\ell_\infty$ -norm were given in [Reference Xie34].
We focus on Wasserstein-distance-based sets because their polyhedral structure makes them particularly suitable for algorithmic developments. The basic workhorse to arrive at tractable reformulations is Fenchel duality. To this aim, in our ddDR2SO problem (2.5), a first step is to specify the concept of closeness of the discrete probability $(q_1,\ldots,q_S)$ to the ball center, $(p_1,\ldots,p_S)$ . As explained in [Reference Noyan, Rudolf and Lejeune22], this can be done by considering the distance between random vectors $\xi(\omega_r)$ and $\xi(\omega_{s})$ , or directly considering the values $p_r$ and $p_s$ . The first approach, adopted by [Reference Pflüg, Pichler and Wozabal25] and [Reference Esfahani and Kuhn9] to deal with continuous measures, is what [Reference Noyan, Rudolf and Lejeune22] called a continuous robustification. The specific stage structure in our risk functional in (2.5) is more suitable for the second option that directly compares the vectors p and q through the recourse functions. We see this second approach as being a robustification of the likelihood (the model is called discrete robustification in [Reference Noyan, Rudolf and Lejeune22]).
For discrete measures in the set
the notion of Wasserstein distance depends on certain weighed costs $\delta^{{\mathbb D}}_{rs}>0$ for $r,s\in \lfloor {S}\rfloor$ , related to transporting the probability mass $p_r$ to $q_s$ , using a general distance function $\mathbb D$ . For our two-stage setting, the cost $\delta^{{\mathbb D}}$ could measure the similarity between recourse functions using the $\ell_1$ -norm. However, we take the total variation distance between probabilities. for simplicity. Examples of how to compute such values from problem data with $\mathbb{D}(\cdot)=\|\cdot\|_1$ are given below, after problem (4.2).
The Wasserstein distance between p and $q\in\mathcal{M}^S$ , defined below, is denoted by $\Delta(p,q)$ , where we drop dependence on the distance $\mathbb D$ to alleviate notation. Its value is given by the optimal transport plan
where $z_{rs}$ represents the amount of mass of $p_r$ that is moved to $q_s$ . The function $\Delta(p,q)$ is convex in both $p\,,q \in \mathcal{M}^S$ . The notion is a distance (and not a mere semi-distance) if the transportation cost has the properties $r \neq s \Rightarrow \delta^{{\mathbb D}}_{rs} > 0$ and $\delta^{{\mathbb D}}_{ss} = 0$ (which implies that $\Delta(p,q) = 0$ if and only if $p = q$ ). When the transportation cost is defined using the $\ell_1$ -norm, the Kantorovich–Rubinstein theorem stated in [Reference Pflüg, Pichler and Wozabal25, Section 1.3] gives an equivalent dual expression for (3.1).
For the reformulation of our ddDR2SO problem (2.5), the following technical result, similar to a statement in [Reference Noyan, Rudolf and Lejeune22, Section 5], is useful. The direct proof given below is based on Lagrangian relaxation.
Lemma 3.1 (Support function of Wasserstein balls). Given the distance defined in (3.1), consider the associated $\ell_1$ -Wasserstein ball
Then, its support function, defined as ${{\mathbb{B}_\kappa(p)}}(d):=\max\left\{d^{\scriptscriptstyle \top} q:q\in\mathbb{B}_\kappa{{(p)}}\right\}$ for any $d\in\mathbb{R}^S$ , has the equivalent expression
Proof. Introducing the S-dimensional vector $\textbf{1}$ with all components equal to 1 for $s\in\lfloor {S}\rfloor$ , write the maximum as a minimization problem:
Also, for notational convenience, we write the relations in (3.1) considering $\delta^{{\mathbb D}}$ and z vectors in $\mathbb{R}^{S^2}$ and introduce $S\times S^2$ matrices $M_p$ and $M_q$ so that
Then, in the resulting problem
This is a linear program, and there is no duality gap. Next, introduce Lagrange multipliers $\eta\in\mathbb{R}$ for the first constraint, $0\leq\mu\in\mathbb{R}^S$ for the second constraint, $\tau\geq 0$ for the constraint $z^{\scriptscriptstyle \top} \delta^{{\mathbb D}}\leq \kappa$ , and $\lambda\in\mathbb{R}^S$ for the last constraint. Since the corresponding Lagrangian
is separable, minimising each component $z_{rs}$ over the set $z_{rs}\geq 0$ gives a solution $z^*_{rs}=0$ as long as
This yields the dual formulation
Taking $v:=\eta\textbf{1}+\lambda$ , discarding the redundant variable $\mu$ , and reformulating the problem in minimization form concludes the proof.
When applied to (2.5) written with ambiguity set equal to the $\ell_1$ -Wasserstein ball, this result implies that
In terms of support functions, this boils down to the identity
Tractability of (2.5) then results from a direct application of Lemma 3.1, since
We note, in passing, that it is possible to include a cost for selecting a given probability in the robustification; the only change would be to add the vector $C_p$ to the recourse function $\mathfrak{Q}_s(x)$ in the constraints.
We now give an interpretation of (3.2) regarding the original two-stage stochastic problem that justifies our naming, of likelihood robustification. Specifically, in (2.1), for each fixed $x \in \mathcal{X}$ , and on the sampling space $\Omega^S$ , the random variable $\mathfrak{Q}(\xi(\omega), x)$ has realizations $\mathfrak{Q}_s(x)$ . The usual expected recourse function therein would be
the best expected cost that can be computed when the probability distribution is known exactly. In comparison, the value in (3.2) gives the best possible value for the expectation when p is replaced by nearby probabilities. It considers all the distributions that are at distance $\kappa > 0$ of the probability p, taken as reference. Having this interpretation in mind, the value in (3.2) represents a likelihood robustification of such expectation, considering a ball with radius $\kappa \geq 0$ . Note, however, that (3.2) has only p in the formulation, which is because Lemma 3.1 has already been applied to get an equivalent form to the worst-case expectation for nearby probabilities.
Noyan et al. [Reference Noyan, Rudolf and Lejeune22] analyse numerous reformulations of DRSO problems. They also show an asymptotic result explaining why the continuous robustification may not be suitable for discrete sampling spaces. More precisely, with the discrete robustification (our setting), when the radius $\kappa$ grows, Section 5 in that work shows that the likelihood robustified expectation (3.2) approaches the value of the worst-case scenario. This property, confirmed numerically in our case study, follows from observing that, for $\kappa$ sufficiently large, any probability is feasible for the max-operation in (2.4). This means, in particular, that the total probability of the worst scenario can be taken. The continuous robustification, by contrast, can far exceed the worst-case scenario. When the radius $\kappa$ grows, the considered ball would include any distribution for the cost realizations.
Lemma 3.1, inspired by a discussion in [Reference Noyan, Rudolf and Lejeune22], is the key to develop our new decomposition method, presented below.
4 Decomposition method for two-stage stochastic linear programs
When problem (2.1) is a two-stage stochastic linear program, its nonsmooth convex recourse functions
can take the values $\pm \infty$ (given vectors $d_s\,, h_s$ and matrices $W_s\,, T_s$ , of appropriate dimensions).
Suppose, in addition, the first-stage feasible set is the polyhedron
and the costs for both x and p are linear. Also, recall that a likelihood robustification ratio is the maximal distance from the nominal probability that the nearby probabilities used to robustify the expectation are allowed to be. In this setting, given a likelihood robustification ratio $\kappa \geq 0$ and using (3.2), problem (2.5) can be expressed as follows:
When $\mathcal{P} = \lbrace p \rbrace$ , this is a well-known convex two-stage stochastic programming problem, for which useful solution procedures based on scenario decomposition are available, including the ones in [Reference Fàbián and Szöke10] and [Reference Oliveira, SagastizÁbal and Scheimberg23].
When the set $\mathcal{P}$ is not a singleton and p becomes a variable, the bilinear terms $p_r v_r$ in the objective function lead to clear computational difficulties. The format of $\mathcal{P}$ is left unspecified for the moment. The only requirement is that it should be a simple convex set, for instance the convex hull of certain probability vectors taken as reference, so that the feasible set in (4.2) remains convex.
Before proceeding further, as announced before (3.1), we discuss possible choices for the transportation costs. A first option is to take
The corresponding measure $\Delta(p,q)$ is the total variation distance between the probabilities p and q. This is the distance used in our case study. Note that, because the total variation distance fails to have $\delta_{rr} > 0$ , it is not a proper distance. However, this does restrict the applicability of (3.2), since the values of $\delta_{rs}$ play no role on validity of the duality employed to obtain (3.2).
For other options, denote all the uncertain data of the problem by
and notice that this random variable determines the value for $\mathfrak{Q}_s(x)$ , via the solution of the sth second-stage problem defined by (4.1). Then, we could take $\delta^{{\mathbb D}}_{rs}$ as
Notwithstanding, as in (3.2) the transportation cost appears in the constraints, the latter choice is not suitable in practice because it would yield a ddDR2SO problem with nonlinearities that are hard to deal with computationally.
Our solution procedure replaces the convex functions $\mathfrak{Q}_s(x)$ by lower-bounding cutting-plane models and uses global bilinear optimization software to solve a sequence of approximating master problems. The optimal values of these approximating problems estimate from below the true optimal value of the nonsmooth and nonconvex master problem (4.2).
For efficiency, it is crucial to compute optimality and feasibility cuts for the recourse functions $\mathfrak{Q}_s(x)$ , as introduced by [Reference Van Slyke and Wets30]. In what follows, we explain how to compute these cuts. For a fixed first-stage decision $x_k \in \mathcal{X}$ , we can solve the problem defining $\mathfrak{Q}_s(x_k)$ , concluding either that $\mathfrak{Q}_s(x_k) \in \lbrace \pm\infty \rbrace$ or obtaining a solution together with an associated Lagrange multiplier for the equality constraints. Each case is explained below.
Having computed optimality and feasibility cuts for the functions $\mathfrak{Q}_s(x)$ , we are in position to define our sequence of lower-bounding approximating master problems to (4.2).
We assume given disjoint sets of optimality indices $O_{sk}$ and feasibility indices $F_{sk}$ such that
as well as the history of first-stage iterates and associated Lagrange multipliers of the second-stage problems,
The sets $O_{sk}$ and $F_{sk}$ are used to organise the information for the generated cuts, noting that knowing if $\lambda_{sk}$ refers to a feasibility or optimality cut depends on the value of $\mathfrak{Q}_s(x_k)$ . The kth approximate master problem is
As stated, the nonconvexity of the model comes from the terms $p_r v_r$ . Additional box constraints are commonly used in cutting-plane methods with a sufficiently large constant $M > 0$ . With those constraints the approximate master problem (4.5) always has a solution (which is not guaranteed using only optimality and feasibility cuts, especially at the initial iterations when the cuts for a given k represent poorly the true recourse functions $\mathfrak{Q}_s(x)$ ).
The solution procedure itself consists in defining a stopping test and rules to manage the sets $O_{sk}$ and $F_{sk}$ , and computing the sequence of iterates converging to a solution of (4.2). Thanks to the well-known lower-bounding property of (4.5), which amounts to satisfaction of the subgradient inequality (4.4) for convex functions, we shall be able to measure an estimate for the gap, as long as the solver computes a global solution of (4.5) using bilinear global optimization. Putting all this information together, using standard arguments, it is easy to show that the following algorithm approximates arbitrarily well a global solution to our ddDR2SO problem (4.2).
From a computational burden standpoint, the algorithm proposed also has some advantages if we compare the number of bilinear terms present in (4.5) with the number of bilinear terms in the deterministic equivalent associated with (4.2) if we drop the robustification terms and set $v_r = \mathfrak{Q}_r(x)$ directly in the objective. In this case, our algorithm solves many problems with a smaller number of bilinear terms, while the deterministic equivalent would solve one large problem with many bilinear terms.
General purpose solvers might fail if used with out-of-the-box configurations. Therefore, it is important to investigate if proper configurations can help the solver at hand. Problem (4.5) can be solved with the package BARON by [Reference Tawarmalani and Sahinidis28], provided some attention is given to details. General hints on how this was achieved in our case study are listed below:
-
1. It is important to warm-start the calculations using previous iterates.
-
2. Constraints in (4.5) are convex, a fact that should be informed to BARON.
-
3. The branching priorities should be adjusted experimentally. For instance, there is no need to branch in $x \in \mathcal{X}$ .
-
4. We obtained better performance relaxing the default stopping tolerances. The default absolute and relative tolerances of BARON can be too tight for the application at hand, which might make the solver branch over ‘numerical trash’.
5 Case study
As mentioned, we are interested in capacity expansion problems leading to sustainable energy systems, generating mostly fully renewable power. We consider a modification of the model in [Reference Birge and Louveaux5, Section 1.3] that represents Europe for the horizon 2020–2050, using ambiguity sets for the probabilities of the uncertain market conditions.
The notation is the following. Investments are planned for time steps $t \in \lfloor {T}\rfloor$ , deciding how much to invest for each type of technology indexed by $i\in\lfloor {I}\rfloor$ . The new amount of technology i, made available at time t, is denoted by $x_i^t$ . The accumulated capacity of technology i at time t is denoted by $w_i^t$ . The cost to install $x_i^t$ is $c_i^t x_i^t$ , and the maintenance cost of the accumulated production capacity $w_i^t$ is $\eta_i^t w_i^t$ . A technology i decided at time t takes $B^t_i$ years to build and has a lifetime of $L^t_i$ .
Regarding generation, costs are uncertain because they depend on uncertain market conditions. The cost of generating electricity given the installed capacity w at scenario $s\in\lfloor {S}\rfloor$ is denoted by $\mathfrak{Q}_s(w)$ . In the original version of [Reference Birge and Louveaux5], each scenario s is endowed with an exogenous probability $p_s$ .
The decision problem is stated below, where $w_i^0 = 0$ and $x_i^t = 0$ for $t < 0$ :
To understand the contribution of ambiguity sets for the probabilities, note first that, because making decisions in real-life in the long term is hard, the investor cannot assign a single distribution of probability to the uncertain events in the problem. Rather, the investor prefers to compute what would be the best possible cost, taking into account that the plausible probability distributions lie in some convex region $\mathcal{P}$ . The goal of the DRSO problem is to check if in the best case the investment return is stable relative to the probabilities chosen. This is a necessary condition when investing in risky assets, especially in the long term.
After careful consideration of the political and technological situations, the decision maker defines some reference probability vectors for the outcomes of operational costs:
(each $p^l$ is a different probability distribution in $\mathbb{R}^S$ ). For this finite number of probabilities, the investor knows that it is enough to solve L different instances of problem (5.1) and compute the minimum. However, for a more comprehensive analysis, it can be preferable to take decisions considering an infinite number of probabilities, for instance by setting
Introducing the simplicial variables $\alpha_l$ for $l\in\lfloor {L}\rfloor$ , this gives the following nonsmooth and nonconvex problem, with decision-dependent probabilities:
We next explain how the likelihood robustification radius $\kappa \geq 0$ comes into play. Having the mentioned stability goal in mind, our investor wants also to find out how the cost would change if instead of best-case scenarios, also a worst-case scenario were to be considered. However, not to exaggerate on the conservatism, this should be done in a continuous manner, for different degrees of pessimism, depending on the value of the parameter $\kappa$ . When null, $\kappa = 0$ represents no pessimism, while $\kappa = \infty$ represents total pessimism. Ultimately, because the plausible probabilities taken as reference may be slightly inaccurate, the investor wants to know how stable the decisions are with respect to the probabilities assigned to the events. For this case study, selecting probabilities comes at no cost, so $C_p \equiv 0$ and the ddDR2SO formulation (4.2) for problem (5.1) is given by
The recourse functions defining the values $\mathfrak{Q}_s(w)$ correspond to second-stage generation problems, for the accumulated capacity w, given the market condition s. Specifically, a time index $t\in\lfloor {T}\rfloor$ is divided into m modes of operation. A mode of operation at time t is characterised by a duration $\tau_{j}^t$ and a demand $d_{sj}^t$ for electricity. The cost of generating energy with technology i and time t is $d_{si}^t$ . For technology i at time t, there are deterministic values of existing capacity $g_i^t$ and decommissioned amounts $u_i^t$ . In scenario s, the capacity $g_i^t$ is affected by a stochastic availability factor $A_{sij}^t$ . The demand $D_{sj}^t$ of electricity is satisfied with the generation $y_{sij}^t$ , produced by technology i, time t, scenario s and mode of operation j. Altogether, this gives the recourse function below:
Let us now explain the practical matters of the problem. We use the model above to make a simplified investment planning for Europe with yearly decisions from 2020 to 2050 using data from [Reference del Granado, Skar, Doukas and Trachanas13] for costs, existing installed capacity, yearly demand, lifetimes and building times. The technologies considered are coal-based, solar PV and wind onshore. The parameters for each technology, as well as their projected demand, are shown in Tables 1–5.
In general, stochastic optimization is used to achieve the diversification effect for the portfolio of electricity generation equipment. However, due to our simplified modelling, this diversification does not appear naturally. For this purpose, the complementarity between wind and solar generation is built into the model using m different modes of generation. More precisely, each time period has two modes, one only for wind (60%) and another for solar (40%). The mode duration is taken proportionally. The availability factor for wind has mean $0.35$ and for solar has mean $0.5$ . Both have variance equal to 50% of the mean. The mean for the demand is reported in Table 5, and its variance is 30% of the mean. The scenarios are generated sampling the respective quantities using a normal distribution, $\mathcal{N}$ , with the corresponding mean and variances.
Also because of the simplified modelling, in the master problem, we have to add a linear constraint limiting the investment at each time period to at most 12% of total installed capacity. The lifetime for coal-based generation is 30 years, while for wind onshore it is 20 years, and for solar PV is 25 years. The decommissioning rate is assumed to be linear so that at the end of the period all thermal power plants are decommissioned. The initial installed capacity of coal-based power is 0.92 TW/h, and the other ones are assumed to be zero.
As mentioned, we use the total variation distance for the transportation costs, as in (4.3). All experiments are run on a notebook with Intel i7 1.90GHz processor, running under Ubuntu 18.04.3 LTS, and using CPLEX 12.10 and Julia 1.1.1; see [Reference Lubin and Dunning19]. The global bilinear solver offered by BARON needs a mixed-integer solver to be provided, which in our case, is CPLEX 12.10.
Below, results are reported only until 2040 because, as often in this setting, the model exhibits abnormal behaviour when approaching the end of the horizon.
As explained as the end of Section 3, the likelihood robustified expectation (3.2) approaches the value of the worst case scenario for sufficiently large $\kappa$ . For this reason, we computed the min-max expansion planning for $S=30$ scenarios, which makes the expansion that minimises the highest cost of any scenario. The min-max expansion planning is shown in Figure 1. This output, known to be cost-conservative, is used as a reference for comparison. We expect the (less conservative) ddDR2SO variants to make more investments.
The comparison is done with the expansions obtained with the following two sets $\mathcal{P} $ as in (5), with
The probabilities $p_l$ used for the Opt-robust variant are obtained as perturbations to the $\left({1 \over S}, \dots, {1 \over S}\right)$ vector, which are normalised afterwards so that the entries of the perturbed vectors sum to one. Note that for a long-term investiment problem, like the one we have, basically any set of probabilities we come up with is arguably fictitious. The resulting mixes, obtained by varying the likelihood robustification radius $\kappa \in\{0, 0.1, 0.2\}$ , are given in Figures 2, 3, and 4, respectively. All experiments take less than one minute to finish, using the decomposition method.
As is seen in Figure 2, the strategy opt-robust makes less investments than the simple, but more investments than the min-max. This is expected from the definitions of the strategies themselves. Notice also that, in all the runs, the amount of coal-based generation decreases linearly due to decommissioning, and because it is not cost-effective.
As the ratio $\kappa$ increases, we observe that the alternative strategies get closer to the min-max, shown in Figure 1. This is clear for the opt-robust variant in Figure 3, with $\kappa=0.1$ , even though the investment with opt-robust remains higher than with min-max. The simple strategy did not change with respect to $\kappa=0$ .
For $\kappa=0.2$ both simple and opt-robust strategies approach the min-max solution in Figure 3. We believe that the good agreement with the min-max strategy, clear already for small $\kappa$ , is due to the choice of ambiguity set $\mathcal{P}$ , which is narrow both for the simple and opt-robust variants. For other choices of $\mathcal{P}$ , the behaviour is likely to be different.
Notice that Figures 2–4 are for ‘in-sample’ uncertainty. They illustrate how the investment model progressively replaces coal-based technologies along the time horizon. An ‘out-of-sample’ performance could be obtained by making a statistical analysis of the production costs required to attend demand, using another set of scenarios. This is a typical construct in stochastic programming. However, for electricity markets, generation cost is not the only indicator to take into account. Many other factors need to be evaluated, such as system reliability, satisfactory behaviour of ancillary services, even the overall carbon footprint. Given the complexity of such analysis, we chose not to perform out-sample analyses (any conclusion not taking into account all the involved aspects could be deemed artificial). Notwithstanding an interesting feature that can be noted in Figures 2–4 is that the investment profile (amount built each year) is stable across the years, with more dependency on probabilities only at the end of the horizon.
Figure 5, with the evolution of costs as a function of $\kappa$ , reveals the opt-robust option as the most expensive one, except for $\kappa = 0$ . The ambiguity set just affects the probabilities for small $\kappa$ . For large $\kappa$ , the event with the worst outcome receives total probability, as shown by [Reference Noyan, Rudolf and Lejeune22].
Additionally, notice in Figure 5 the smooth behavior of the cost when the robustification ratio $\kappa$ varies. This confirms the expectation that our approach brings some stability to the decision-making process (with respect to variations in the probability distribution defining the problem).
To conclude, we examine whether our decomposition algorithm improves solution times. The advantage of decomposition is that the sequence of problems solved has either a much smaller number of bilinear terms or a much smaller number of variables. As illustrated by [Reference van Ackooij, Frangioni and de Oliveira29], decomposition methods, which typically scale well, have a strong advantage when solution times of combinatorial or nonconvex problems increase dramatically with the dimension of the problem. This is confirmed by our results in the tables below. Another remark is that, since BARON uses a nonlinear programming solver in the local search procedure, issues related to precision, robustness and time tend to affect the deterministic equivalent of the problem more strongly as the number of variables grows. In Tables 6 and 7, we show statistics for the solution times varying the number of scenarios with ten probabilities ( $L=10$ ) and some values of $\kappa$ . Each experiment is repeated four times. The structure of the problem is the same for all values of $\kappa$ , and therefore, the decomposition method works just the same because the solution times of the nonconvex master problem dominates the execution.
In summary, our likelihood robustified optimistic problem provides more conceptual flexibility than the min-max (mimization of the highest realization of the cost) and min-max regret (minimization of largest deviation of the best cost for each scenario) approaches. This should provide better out-of-sample operational performance on a given set of scenarios, after the expansion plan is decided. In practice, the value of $\kappa$ can be selected having these remarks in mind, besides also being useful to check for the stability of the current decisions. Specifically in our case, the more technology is built, the smaller the out-of-sample cost to meet the demand (because less slack generation needs to be activated, as the dispatch follows the order of merit). In other words, the belief on ambiguity sets of probabilities determines how much technology should be built, and the more technology built the less the out-of-sample operational cost.
6 Final comments
Generating power with minimal carbon footprint over an horizon of 30 years is the goal established by many countries worldwide, especially in Europe. To this aim, it is important to design capacity expansion models that take into account both the progressive decommissioning of fuel-fosil technologies and the uncertainty brought into the system by the intermittency of wind and solar power generation. To address these issues, we propose a new approach for decisions under probability ambiguity placed between the well-known optimistic and pessimistic paradigms. The interest of our approach is illustrated by a case study whose numerical solution relies on scenario-wise decomposition to solve the resulting nonsmooth and nonconvex problem to global optimality.
Sensitivity with respect to probabilities can be mitigated by a large volume of data. However, if the underlying planning task is of long-term nature, like capacity and transmission expansion in energy systems, this issue becomes more important, as changes in the probability values induce qualitative changes of the decisions suggested by planning models. With this perspective, trying to identify nearby problem data that would change the qualitative aspect of the decisions is appealing (say, to determine if a change in the probabilities drives a profitable activity to a non-profitable one). This type of analysis can be done with our proposed framework, by means of the ambiguity sets.
Strategies based on min-max and min-max regret are often used in the capacity and transmission expansion planning of power systems, for instance in [Reference del Granado, Skar, Doukas and Trachanas13]. Those measures are well known for their conservatism, but they do not consider robustness with respect to probability distributions. Ambiguity sets, endowed with the technique introduced in this work for likelihood robustification, could be used to robustify the decision-making process of models based on min-max, min-max regret, CVaR, and other risk measures.
The use of convex combinations of probability distributions was employed in [Reference Hellemo, Barton and Tomasgard15] for models where the parameters of probability distributions can be influenced by investment decisions. Models with endogenous uncertainty are usually nonconvex and their computational solution might be time consuming or unreliable. Both the ddDR2SO model (4.2) and the decomposition Algorithm 2 remain applicable if the robustification parameter or the probability set depends on the first-stage variable, as long as the latter remains a convex set (in (4.2) $\kappa=\kappa(x)$ or in (5.2) $\mathcal{P}=\mathcal{P}(x)$ ).
When compared to the settings discussed by [Reference Noyan, Rudolf and Lejeune22] and [Reference Kuhn, Esfahani, Nguyen and Shafieezadeh-Abadeh17], our main algorithmic contribution is in exploiting the specific structure of the two-stage problem to design an efficient decomposition method. Finally note that, while most of the numerical schemes for distributionally robust optimization entail the solution of convex models, our approach is nonconvex but remains computationally tractable, thanks to decomposition.
Finally, note that it would be possible to deal with discrete distributions with an unequal number of scenarios, by assigning probability zero to ‘excess’ scenarios. For example, if one distribution has outcomes $\lbrace 1, 2 \rbrace$ and another has outcomes $\lbrace 1 \rbrace$ , we can just consider that the second distribution also has outcome ‘2’ with a very small (or zero) probability assigned to it.
Acknowledgments
This work was supported by the TACEMM project (Trans-Atlantic Cooperation on Energy Market Models). Research of the second author is partly funded by CNPq Grant 306089/2019-0, CEPID CeMEAI, and FAPERJ in Brazil. The third author is supported by CNPq Grant 303913/2019-3, FAPERJ Grant E-26/202.540/2019, and PRONEX–Optimization.