Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-25T19:48:59.166Z Has data issue: false hasContentIssue false

A distributionally ambiguous two-stage stochastic approach for investment in renewable generation

Published online by Cambridge University Press:  12 May 2022

PEDRO BORGES
Affiliation:
Instituto de Matemática Pura e Aplicada, Rio de Janeiro, RJ, Brazil emails: [email protected]; [email protected]
CLAUDIA SAGASTIZÁBAL
Affiliation:
IMECC/UNICAMP, Campinas, São Paulo, Brazil email: [email protected]
MIKHAIL SOLODOV
Affiliation:
Instituto de Matemática Pura e Aplicada, Rio de Janeiro, RJ, Brazil emails: [email protected]; [email protected]
ASGEIR TOMASGARD
Affiliation:
Norwegian University of Science and Technology, Trondheim, Trondelag, Norway email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

The optimal expansion of a power system with reduced carbon footprint entails dealing with uncertainty about the distribution of the random variables involved in the decision process. Optimisation under ambiguity sets provides a mechanism to suitably deal with such a setting. For two-stage stochastic linear programs, we propose a new model that is between the optimistic and pessimistic paradigms in distributionally robust stochastic optimisation. When using Wasserstein balls as ambiguity sets, the resulting optimisation problem has nonsmooth convex constraints depending on the number of scenarios and a bilinear objective function. We propose a decomposition method along scenarios that converges to a solution, provided a global optimisation solver for bilinear programs with polyhedral feasible sets is available. The solution procedure is applied to a case study on expansion of energy generation that takes into account sustainability goals for 2050 in Europe, under uncertain future market conditions.

Type
Papers
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1 Introduction

Modelling the transition to sustainable energy systems poses several challenges in Mathematical Optimization, especially in the long term. For an electricity system to deliver mostly renewable energy by 2050, the capacity planning model must take into account several sources of uncertainty. With the progressive decommissioning of carbon and gas-fueled power, wind and solar energy becomes fundamental for achieving the 2050 goal. Such technologies are intermittent but not storable, a feature that makes more risky the investment in renewable capacity expansion. In countries with hydrogeneration, like Brazil and Norway, water reservoirs serve as storage to transfer energy from/to periods with high/low availability of renewable sources. However, the intermittency of wind and solar power puts under stress the hydro-reservoirs. As explained in luna-sagastizabal-silva-2021, Brazil’s storage capacity decreased more than 10% in the past decade, making the system less flexible and more vulnerable to uncertainty.

When designing capacity expansion models, taking into account the considerations above is crucial. Stochastic Programming models are not suitable to provide sound decision-making because uncertain future market conditions (supply and demand for electricity, generation and investment costs) are only indirectly observable, for example through samples. If the probability values used in the model are incorrect, the stochastic program can give suboptimal results and lead to wrong investment decisions (in a risky environment!). To address this issue, the more recent area of Distributionally Robust Stochastic Optimization (DRSO) considers the probability as an additional decision variable, to be chosen among many distributions in a certain ambiguity set, see for instance the works by [Reference Pflüg and Wozabal24] and [Reference Wozabal33].

A second major concern for the decision maker is how to deal with random realisations that are influenced by the decisions. For capacity expansion problems, such is the case when the Government subsidises the cost of equipment related to wind or solar power (a large future investment on renewables could result in the Governement lowering its subsidized fraction). Stochastic Programming models with decision-dependent uncertainty were considered by [Reference Jonsbråten, Wets and Woodruff16,Reference Goel and Grossmann11] and, more recently, [Reference Hellemo, Barton and Tomasgard15]. Robust formulations of problems with decision-dependent uncertainty are shown to be NP-complete by [Reference Nohadani and Sharma21]. Both [Reference Goh and Sim12] and [Reference Postek, den Hertog and Melenberg26] confirm that the well-known conservatism of robust optimisation approaches is mitigated thanks to the consideration of ambiguity sets.

With a DRSO model, endogenous probability distributions imply a variation of the ambiguity set with the decision variable. This is the model explored in this work that we focus on decision-dependent distributionally robust two-stage stochastic optimization (ddDR2SO). A ddDR2SO problem outputs the optimal value for the probability distribution together with the optimal decisions. The connection between the endogenous probability and the decision variable can be explicitly given through constraints in the ambiguity set, or indirectly, by the fact that the optimization is performed jointly on both type of variables; see (2.4) below.

Regarding ambiguity sets, several different specifications have been considered in the literature. One possibility is to look for probability distributions whose first- and second-order moments are close to those of some exogenous empirical estimation. To cite a couple of works among many others, this is the approach in [Reference Bertsimas, Doan, Natarajan and Teo4] and [Reference Delage and Ye7], where the wording ‘distributionally robust’ was first coined. Another line of research considers probability distributions that are close in some metric to a nominal probability, taken as a reference. Typical measures of closeness of measures are the phi-divergence distance and the Wasserstein balls, respectively, analysed by [Reference Ben-Tal and Teboulle3] and [Reference Pflüg and Wozabal24]. Phi-divergences, introduced in [Reference Calafiore6] for ambiguity sets, are employed in [Reference Wang, Glynn and Ye31] to include distributions that make observed historical data achieve a certain level of likelihood. Bayraksan and Love [Reference Bayraksan and Love1] extended the concept to two-stage stochastic linear programs.

Usually, DRSO is considered as being between Stochastic Programming and Robust Optimization. In the spectrum of possible choices for the probability distribution that enters the optimization problem, the modelling paradigms of Stochastic Programming and DRSO could be seen as positioned at opposite extremes with respect to a certain distance to a nominal probability distribution. The latter takes the worst case over the ambiguity set of probabilities, while with the former the choice of ambiguity set shrinks to a singleton (the nominal probability). From the point of view of decision-making, this amounts to adopting either a fully optimistic or a somehow pessimistic view of how well the probability distribution fits the random nature of the world.

We propose an in-between paradigm that is both optimistic and pessimistic to certain degrees. For ambiguous two-stage stochastic programs, our ddDR2SO model defines a robustified expected recourse function using probabilities in a Wasserstein ball. The novelty is that, instead of taking a nominal probability, the ball center is considered variable. This additional variable is minimized in the first stage over a simple convex set, for example the convex hull of several nominal probabilities taken as a reference.

When considering discrete distributions, with a finite number of scenarios, the new model can be reformulated as a bilinear programming problem amenable to the solver BARON, for instance; see Lemma 3.1 below. The structure is suitable for decomposition: bilinearity appears only in the objective function, and the feasible set has as many convex nonsmooth constraints as scenarios in the problem. Indeed, as stated opportunely, the convex nonsmooth recourse functions of the original two-stage problem appear in the constraint set of our reformulation. Similarly to the L-shaped method by [Reference Van Slyke and Wets30], the solution of scenario subproblems provides cuts that either linearize the recourse function or cut-off infeasible first-stage points. The algorithm then proceeds by incorporating feasibility and optimality cuts into a master program that still has a bilinear objective function, but now a polyhedral feasible set. If the solution method employed to solve the master programs can find global solutions, the procedure converges to a global solution. In our case study, this is ensured by using the solver BARON, by [Reference Tawarmalani and Sahinidis28].

The interest of the decomposition in a ddDR2SO framework is that, instead of solving one large, difficult, problem with many bilinear terms (and a nonsmooth feasible set), each iteration solves a much easier problem, having less bilinear terms, less variables, and a polyhedral feasible set. These features are crucial for computational efficiency. The methodology is illustrated on a case study for planning investments in energy generation, under uncertain market conditions. The simplified model considers the whole of Europe over the horizon 2020–2050, taking into account the progressive decommissioning of thermal power plants and the increasing proportions of renewable technologies that are foreseeable for the power mix. Ambiguity sets are particularly suited, because the long-term horizon complicates vastly the assignment of a unique probability to an uncertain event. Our approach gives an indication if the investment return is stable relative to parameters of the problem.

The work is organised as follows. The new paradigm, which deals with ambiguity from a perspective placed between the optimistic and the pessimistic models, is given in Section 2. Tractability for Wasserstein balls, shown in Section 3, is exploited in Section 4 to derive the decomposition method for ddDR2SO two-stage linear programs. A thorough numerical assessment, validating the approach for the energy investment problem, is performed in Section 5. The paper ends with some concluding comments and remarks.

2 Partly optimistic and partly pessimistic settings

Consider a vector of decision variables $x \in \mathbb{R}^n$ with feasible set $\mathcal{X} \subset \mathbb{R}^n$ . Uncertain parameters are modelled by a measurable mapping $\xi:\Omega\to\Xi\subset\mathbb{R}^m$ and, given a $\sigma$ -algebra $\mathcal F$ , the sample space is $(\Omega,\mathcal{F})$ . Given a function $C_x:\mathbb{R}^n\to\mathbb{R}$ and a cost-to-go mapping $\mathfrak{Q}:\mathbb{R}^n\times\Xi\to\mathbb{R}\cup\{\pm\infty\}$ , the two-stage problem attached to the probability distribution p is

(2.1) \begin{equation}\min_{x\in\mathcal X}\Bigg\lbrace C_x(x)+\mathbb{E}_p\left[\mathfrak{Q}(x,\xi(\omega)) \right]\Bigr\}\Bigg\rbrace \,.\end{equation}

In the objective function, we make explicit the deterministic cost $C_x(x)$ to exploit two-stage structural properties favorable for decomposition methods, like those in [Reference Sagastizábal27], see also [Reference Love and Bayraksan18].

In order to formalize our approach, a nonempty set of probability distributions $\mathcal{P}$ is defined on the set $\mathcal{M}$ of measures on the sample space. If selecting the distribution $p \in \mathcal{P}$ , among all the possible probability distributions, entails a cost $C_p:\mathcal{M}\to\mathbb{R}$ , our first model of distributionally ambiguous problem has the form

(2.2) \begin{equation}\min_{x\in\mathcal X}\Bigg\lbrace C_x(x)+ \sup_{p\in\mathcal P} \Bigl\{C_p(p)+\mathbb{E}_p\left[\mathfrak{Q}(x,\xi(\omega)) \right]\Bigr\}\Bigg\rbrace \,.\end{equation}

The cost for selecting a probability distribution can be used in diverse ways. For instance, it can be a Euclidean distance to a nominal probability or it can be a measure of entropy. It can also be used to make the model avoid selecting only probability distributions that assign higher likehoods only to the best scenarios. The format (2.2) also encompasses the robust optimisation paradigm by [Reference Ben-Tal, Ghaoui and Nemirovski2], taking as ambiguity set the full space of measures. Continuing with connections with other models, notice that the DRSO problem (2.2) resolves the ambiguity present in the probability distribution by adopting a pessimistic view, as in [Reference Wiesemann, Tsoukalas, Kleniati and Rustem32] and [Reference Kuhn, Esfahani, Nguyen and Shafieezadeh-Abadeh17, Section 2.1]. Specifically, introducing the worst-case risk functional,

\begin{equation*} \Theta_{\textrm{pess}} (x) := \sup_{p \in \mathcal{P}} \Bigl\{ C_p(p)+\mathbb{E}_p\left[\mathfrak{Q}(x,\xi(\omega)) \right]\Bigr\}\,,\end{equation*}

gives the following problem, equivalent to (2.2):

\begin{equation*}\min_{x\in\mathcal X} \Bigl\{C_x(x)+ \Theta_{\textrm{pess}}(x)\Bigr\}\,. \end{equation*}

Another option is to choose the most favorable output, plugging in the optimisation problem the risk functional

\begin{equation*} \Theta_{\textrm{opt}} (x) := \inf_{p \in \mathcal{P}} \Bigl\{ C_p(p)+\mathbb{E}_p\left[\mathfrak{Q}(x,\xi(\omega)) \right]\Bigr\}\,,\end{equation*}

that [Reference Dempe, Dutta and Mordukhovich8] called optimistic solution in bilevel programming.

Note that $\Theta_{\textrm{pess}} (x)$ and $\Theta_{\textrm{opt}} (x)$ are associated to two different bilevel problems expressing, respectively, a pessimistic and an optimistic view on risk aversion. They are not related with the pessimistic and optimistic solutions of the same bilevel problem.

Our proposal, set amid these two alternatives, is to consider problems of the form

(2.3) \begin{equation} \min_{x\in\mathcal X} \Bigl\{C_x(x)+ \Theta_{\kappa}(x)\Bigr\}\quad\text{ for }\Theta_{\kappa}(x):= \inf_{p \in \mathcal{P}} \Bigl\{ \sup_{q \in \mathbb{B}_\kappa{{(p)}}} \Bigl\{C_p(q)+\mathbb{E}_q\left[\mathfrak{Q}(x,\xi(\omega)) \right]\Bigr\}\Bigr\}\,,\end{equation}

where $\mathbb{B}_\kappa{{(p)}} \subset \mathcal{M}$ is a ball about p of radius $\kappa$ . Tractability of the new approach depends on the choice of these balls, an issue considered in details in Section 3. For now we just mention that with the Wasserstein metric the balls are defined by a system of affine inequalities, a convenient feature when it comes to computational implementation. In particular, proceeding in a manner similar to the presentation by [Reference Noyan, Rudolf and Lejeune22], we use duality arguments and the Wasserstein balls, in Lemma 3.1, to rewrite the supremum defining the risk functional as a minimum, therefore avoiding certain technical issues that arise with the pessimistic formulation.

Note that there are two ambiguity sets, namely, $\mathcal{P}$ and $\mathbb{B}_\kappa(p)$ . Since in DRSO ambiguity sets are designed by the user (so that they are not too complex, but still provide good out-of-sample performances), one might ask whether a formulation with two ambiguity sets does not require too much information from the user. Fortunately, this is not the case in our setting because the second ambiguity set has a rather specific structure: the effect of ambiguity about p can be handled explicitly in the optimisation problem (see Lemma 3.1). Thanks to this feature, our formulation amounts to a problem like (2.3), with only one ‘optimistic’ ambiguity set $\mathcal{P}$ , and additional terms that represent the impact of $\mathbb{B}_\kappa(p)$ .

Since in (2), letting $\mathbb{B}_\kappa{{(p)}}=\lbrace p \rbrace$ (intuitively, $\kappa=0$ ) and $\mathbb{B}_\kappa{{(p)}}=\mathcal{P}$ (intuitively, $\kappa=\infty$ ), respectively, gives the optimistic and pessimistic approaches, problem (2.3) can be thought of being cast in a robustified optimistic setting. The parameter $\kappa$ , called the robustification ratio, determines to which extent the risk functional in (2.3) bends towards an optimistic or a pessimistic view. The resulting problem takes into account some ambiguity, while hedging against estimation errors on the probability values. Additionally, because the optimization process outputs the ‘nominal’ probability p together with the decision variable x, our model (2.3) is also suitable for problems with decision-dependent probabilities.

When comparing optimal values, our formulation lies between the optimistic and pessimistic models. To make this clear, observe that $\Theta_\kappa(x) \leq \Theta_{\kappa + \eta}(x)$ for all $\kappa, \eta \geq 0$ . Therefore, the inequality

\begin{equation*}\min_{x\in\mathcal X} \Bigl\{C_x(x)+ \Theta_\kappa(x)\Bigr\} \quad \leq \quad \min_{x\in\mathcal X} \Bigl\{C_x(x)+ \Theta_{\kappa + \eta}(x)\Bigr\}.\end{equation*}

holds. It follows that for $\kappa = 0$ the optimal value of (2.3) is optimistic and for $\kappa = \infty$ , the optimal value is pessimistic. In this sense, our approach is between the optimistic and the pessimistic.

When the sampling space is discrete, say given by scenarios in the set $\lfloor {S}\rfloor:=\left\{1,\ldots,S\right\}$ , only discrete measures are considered and $\mathcal{P}\subset\mathbb{R}^S$ is assumed to be a bounded set. In this context, problem (2.3) writes down as follows:

(2.4) \begin{equation}\min_{x\in\mathcal X} \left\{C_x(x)+\left[\min_ {p\in\mathcal{P}\subset\mathbb{R}^S}\left( C_p(p)+\max_ {q \in \mathbb{B}_\kappa{{(p)}}\subset\mathbb{R}^S}\sum_{s=1}^S q_s \mathfrak{Q}_s(x)\right)\right]\right\}\,,\end{equation}

where we use the shorter notation $\mathfrak{Q}_s(\cdot):=\mathfrak{Q}(\cdot,\xi_s(\omega))$ for the recourse functions.

When the ambiguity set $\mathbb{B}_\kappa{{(p)}}$ is a Wasserstein-type ball of nearby probabilities, the resulting structure is exploited in Lemma 3.1 to derive a dualization formula, inspired from [Reference Noyan, Rudolf and Lejeune22]. As demonstrated in Section 4, this yields an explicit expression that makes our ddDR2SO problem tractable. To this end, it is convenient to consider separately the q-decision variable in the maximum and write the problem in the following form:

(2.5) \begin{equation}\min_{x\in\mathcal{X},p\in\mathcal{P}} \Bigl\{C_x(x)+C_p(p)+ {\mathbb{E}}_{{\mathbb{B}_\kappa(p)}}\left[\mathfrak{Q}(x)\right] \Bigr\},\;\mbox{ where }\quad {\mathbb{E}}_{{\mathbb{B}_\kappa(p)}}\left[\mathfrak{Q}(x)\right] := \max_{q\in\mathbb{B}_\kappa{{(p)}}} \sum_{s=1}^S q_s \mathfrak{Q}_s(x)\,.\end{equation}

In problem (2.5), the probability $p \in \mathcal{P}$ is a decision variable. This modelling device appears useful in settings like the one considered in our numerical experience. For cases where assigning a distribution to uncertainty is not straightforward, basing decisions over a set of probabilities provides hedging. Thanks to the second ambiguity set, $\mathbb{B}_\kappa(p)$ , the expected value in the objective of (2.5) is robustified (hedged) against estimation errors on $\mathcal{P}$ . The set $\mathcal{P}$ may be chosen as a vicinity to some empirical distribution or by other means. In our case, since we consider a long-term investment problem for 30 years into the future, the probabilities would essentially be chosen by specialists, or even by trial and error, to evaluate possible investment decisions of the model. For very long horizons, the decision maker might be actually interested in assessing the best and worst costs induced by choices of probabilities, in which case, the systematic calibration of $\kappa$ is useful. Finally, our reformulation to (2.3) makes it clear that the impact of changes in $\kappa$ cannot be reproduced using only one ambiguity set $\mathcal{P}$ .

3 Wasserstein balls and likelihood robustification

Robustification operations are based on defining vicinities for the quantities being robustified, and then taking the supremum over that vicinity. The underlying quantity can be an expectation or a variance. The idea can be applied in general with varying degrees of practical computational use, while the procedure depends on the specific form of the balls considered in (2.5). Each specific form of vicinity gives rise to a different robustification strategy. Tractability of ambiguous problems with Wasserstein balls is discussed in [Reference Postek, den Hertog and Melenberg26] for discrete measures and in [Reference Hanasusanto and Kuhn14] for continuous distributions. The latter work, in particular, derives a copositive reformulation for two-stage robust and DRSO linear programs from $\ell_2$ -Wasserstein balls. The same authors show that when the ambiguity set is centered at a discrete distribution and there are no support constraints, the $\ell_1$ -Wasserstein ball gives linear programming problems. More recently, complexity bounds for two-stage DRSO problems with the $\ell_\infty$ -norm were given in [Reference Xie34].

We focus on Wasserstein-distance-based sets because their polyhedral structure makes them particularly suitable for algorithmic developments. The basic workhorse to arrive at tractable reformulations is Fenchel duality. To this aim, in our ddDR2SO problem (2.5), a first step is to specify the concept of closeness of the discrete probability $(q_1,\ldots,q_S)$ to the ball center, $(p_1,\ldots,p_S)$ . As explained in [Reference Noyan, Rudolf and Lejeune22], this can be done by considering the distance between random vectors $\xi(\omega_r)$ and $\xi(\omega_{s})$ , or directly considering the values $p_r$ and $p_s$ . The first approach, adopted by [Reference Pflüg, Pichler and Wozabal25] and [Reference Esfahani and Kuhn9] to deal with continuous measures, is what [Reference Noyan, Rudolf and Lejeune22] called a continuous robustification. The specific stage structure in our risk functional in (2.5) is more suitable for the second option that directly compares the vectors p and q through the recourse functions. We see this second approach as being a robustification of the likelihood (the model is called discrete robustification in [Reference Noyan, Rudolf and Lejeune22]).

For discrete measures in the set

\begin{equation*}\mathcal{M}^S:=\left\{p\in\mathbb{R}^S: p\geq 0\,,\sum_{r=1}^S p_r=1\right\}\,,\end{equation*}

the notion of Wasserstein distance depends on certain weighed costs $\delta^{{\mathbb D}}_{rs}>0$ for $r,s\in \lfloor {S}\rfloor$ , related to transporting the probability mass $p_r$ to $q_s$ , using a general distance function $\mathbb D$ . For our two-stage setting, the cost $\delta^{{\mathbb D}}$ could measure the similarity between recourse functions using the $\ell_1$ -norm. However, we take the total variation distance between probabilities. for simplicity. Examples of how to compute such values from problem data with $\mathbb{D}(\cdot)=\|\cdot\|_1$ are given below, after problem (4.2).

The Wasserstein distance between p and $q\in\mathcal{M}^S$ , defined below, is denoted by $\Delta(p,q)$ , where we drop dependence on the distance $\mathbb D$ to alleviate notation. Its value is given by the optimal transport plan

(3.1) \begin{equation}\Delta(p,q) := \underset{z}{\text{min}} \left\{ \sum_{s,r=1}^S \delta^{\mathbb{D}}_{rs} z_{rs} \quad : \quad \sum_{s=1}^S z_{rs} = p_r, \quad \sum_{r=1}^S z_{rs} = q_s, \quad z_{rs} \geq 0\,,\text{for }r\, ,s\in \lfloor S \rfloor \right\},\end{equation}

where $z_{rs}$ represents the amount of mass of $p_r$ that is moved to $q_s$ . The function $\Delta(p,q)$ is convex in both $p\,,q \in \mathcal{M}^S$ . The notion is a distance (and not a mere semi-distance) if the transportation cost has the properties $r \neq s \Rightarrow \delta^{{\mathbb D}}_{rs} > 0$ and $\delta^{{\mathbb D}}_{ss} = 0$ (which implies that $\Delta(p,q) = 0$ if and only if $p = q$ ). When the transportation cost is defined using the $\ell_1$ -norm, the Kantorovich–Rubinstein theorem stated in [Reference Pflüg, Pichler and Wozabal25, Section 1.3] gives an equivalent dual expression for (3.1).

For the reformulation of our ddDR2SO problem (2.5), the following technical result, similar to a statement in [Reference Noyan, Rudolf and Lejeune22, Section 5], is useful. The direct proof given below is based on Lagrangian relaxation.

Lemma 3.1 (Support function of Wasserstein balls). Given the distance defined in (3.1), consider the associated $\ell_1$ -Wasserstein ball

\begin{equation*}\mathbb{B}_\kappa{{(p)}}:=\left\{q\in\mathcal{M}^S: \Delta(p,q)\leq\kappa \!\right\}\,.\end{equation*}

Then, its support function, defined as ${{\mathbb{B}_\kappa(p)}}(d):=\max\left\{d^{\scriptscriptstyle \top} q:q\in\mathbb{B}_\kappa{{(p)}}\right\}$ for any $d\in\mathbb{R}^S$ , has the equivalent expression

\begin{equation*}{\sigma}_{{\mathbb{B}_\kappa(p)}}(d)=\left\{\begin{array}{ll}\min&\tau\kappa +\displaystyle\sum_{s=1}^S p_s v_s \\[5pt] \textrm{s.t.}&\tau\in\mathbb{R}\,, \tau\geq 0,\\[5pt] &v\in\mathbb{R}^S,\\[5pt] & v_s \geq d_r-\tau\delta^{{\mathbb D}}_{sr} \quad\mbox{ for }r\,,s\in\lfloor {S}\rfloor \,.\end{array}\right. \end{equation*}

Proof. Introducing the S-dimensional vector $\textbf{1}$ with all components equal to 1 for $s\in\lfloor {S}\rfloor$ , write the maximum as a minimization problem:

\begin{equation*}-{\sigma}_{{\mathbb{B}_\kappa(p)}}(d)=\min\left\{-d^{\scriptscriptstyle \top} q:\textbf{1}^{\scriptscriptstyle \top} q=1\,,q\geq 0\,,q\in\mathbb{B}_\kappa{{(p)}}\right\}\,.\end{equation*}

Also, for notational convenience, we write the relations in (3.1) considering $\delta^{{\mathbb D}}$ and z vectors in $\mathbb{R}^{S^2}$ and introduce $S\times S^2$ matrices $M_p$ and $M_q$ so that

\begin{equation*}\Delta(p,q)=\min\left\{z^{\scriptscriptstyle \top} \delta^{{\mathbb D}} : M_p z=p\,,M_q z=q\,,z\geq 0\right\}\,.\end{equation*}

Then, in the resulting problem

\begin{equation*}-{\sigma}_{{\mathbb{B}_\kappa(p)}}(d)=\left\{\begin{array}{ll}\min&-d^{\scriptscriptstyle \top} q \\[5pt] {\textrm{s.t.}}&\textbf{1}^{\scriptscriptstyle \top} q=1,\\[5pt]&q\geq 0,\\[5pt]&z\geq0,\\[5pt]&z^{\scriptscriptstyle \top} \delta^{{\mathbb D}}\leq\kappa,\\[5pt]& M_p z=p,\\[5pt]&M_q z=q\,.\end{array}\right. {\textrm{eliminate the variable}} {\,} {q:} \left\{\begin{array}{ll}\min&-d^{\scriptscriptstyle \top} M_q z\\[5pt]{\textrm{s.t.}}&\textbf{1}^{\scriptscriptstyle \top} M_qz=1,\\[5pt]&M_qz\geq 0,\\[5pt]&z\geq0,\\[5pt]&z^{\scriptscriptstyle \top} \delta^{{\mathbb D}}\leq\kappa,\\[5pt]& M_p z=p\,.\end{array}\right. \end{equation*}

This is a linear program, and there is no duality gap. Next, introduce Lagrange multipliers $\eta\in\mathbb{R}$ for the first constraint, $0\leq\mu\in\mathbb{R}^S$ for the second constraint, $\tau\geq 0$ for the constraint $z^{\scriptscriptstyle \top} \delta^{{\mathbb D}}\leq \kappa$ , and $\lambda\in\mathbb{R}^S$ for the last constraint. Since the corresponding Lagrangian

\begin{equation*}L(z,\eta,\mu,\tau,\lambda)=z^{\scriptscriptstyle \top}\Bigr(M_q^{\scriptscriptstyle \top}(-d+\textbf{1}\eta-\mu)+\tau\delta^{{\mathbb D}}+M_p^{\scriptscriptstyle \top}\lambda\Bigr) -\eta-\tau\kappa-p^{\scriptscriptstyle \top}\lambda\end{equation*}

is separable, minimising each component $z_{rs}$ over the set $z_{rs}\geq 0$ gives a solution $z^*_{rs}=0$ as long as

\begin{equation*}M_q^{\scriptscriptstyle \top}(-d+\textbf{1}\eta-\mu)+\tau\delta^{{\mathbb D}}+M_p^{\scriptscriptstyle \top}\lambda\geq0\iff-d_r+\eta-\mu_r+\tau\delta^{{\mathbb D}}_{rs}+\lambda_s\geq 0\,. \end{equation*}

This yields the dual formulation

\begin{equation*}-{\sigma}_{{\mathbb{B}_\kappa(p)}}(d)=\left\{\begin{array}{ll}\max& -\eta-\tau\kappa-p^{\scriptscriptstyle \top}\lambda\\[5pt]{\textrm{s.t.}}&\eta\,,\tau\in\mathbb{R}\,,\mu\,,\lambda\in\mathbb{R}^S,\\[5pt]&\tau\geq 0\,,\mu\geq0,\\[5pt]& \eta+\lambda_s\geq \mu_r+ d_r-\tau\delta^{{\mathbb D}}_{sr} \mbox{ for }r\,,s\in\lfloor {S}\rfloor \,.\end{array}\right. \end{equation*}

Taking $v:=\eta\textbf{1}+\lambda$ , discarding the redundant variable $\mu$ , and reformulating the problem in minimization form concludes the proof.

When applied to (2.5) written with ambiguity set equal to the $\ell_1$ -Wasserstein ball, this result implies that

\begin{equation*}{\mathbb{E}}_{{\mathbb{B}_\kappa(p)}}\left[\mathfrak{Q}(x)\right] = \max_{q\geq0}\left\{ \sum_{s=1}^S q_s \mathfrak{Q}_s(x):\sum_{s=1}^S q_s=1\,,\Delta(p,q)\leq\kappa\text{for} \Delta\text{(p,q) from (3.1)}\right\}\,. \end{equation*}

In terms of support functions, this boils down to the identity

\begin{equation*}{\mathbb{E}}_{{\mathbb{B}_\kappa(p)}}\left[\mathfrak{Q}(x)\right] = {\sigma}_{{\mathbb{B}_\kappa(p)}}(\mathfrak{Q}(x))\,.\end{equation*}

Tractability of (2.5) then results from a direct application of Lemma 3.1, since

(3.2) \begin{equation} {\mathbb{E}}_{{\mathbb{B}_\kappa(p)}}\left[\mathfrak{Q}(x)\right] =\left\{\begin{array}{ll}\min&\tau\kappa +\displaystyle\sum_{r=1}^S p_r v_r\\[5pt]{\textrm{s.t.}}&\tau\in\mathbb{R}\,, \tau\geq 0,\\[5pt]&v\in\mathbb{R}^S,\\[5pt]& v_r\geq \mathfrak{Q}_s(x)-\tau\delta^{{\mathbb D}}_{rs} \quad\mbox{ for }r\,,s\in\lfloor {S}\rfloor \,.\end{array}\right. \end{equation}

We note, in passing, that it is possible to include a cost for selecting a given probability in the robustification; the only change would be to add the vector $C_p$ to the recourse function $\mathfrak{Q}_s(x)$ in the constraints.

We now give an interpretation of (3.2) regarding the original two-stage stochastic problem that justifies our naming, of likelihood robustification. Specifically, in (2.1), for each fixed $x \in \mathcal{X}$ , and on the sampling space $\Omega^S$ , the random variable $\mathfrak{Q}(\xi(\omega), x)$ has realizations $\mathfrak{Q}_s(x)$ . The usual expected recourse function therein would be

\begin{equation*}\mathbb{E}_p\left[\mathfrak{Q}(x) \right]=\sum_{s=1}^S p_s \mathfrak{Q}_s(x)\,,\end{equation*}

the best expected cost that can be computed when the probability distribution is known exactly. In comparison, the value in (3.2) gives the best possible value for the expectation when p is replaced by nearby probabilities. It considers all the distributions that are at distance $\kappa > 0$ of the probability p, taken as reference. Having this interpretation in mind, the value in (3.2) represents a likelihood robustification of such expectation, considering a ball with radius $\kappa \geq 0$ . Note, however, that (3.2) has only p in the formulation, which is because Lemma 3.1 has already been applied to get an equivalent form to the worst-case expectation for nearby probabilities.

Noyan et al. [Reference Noyan, Rudolf and Lejeune22] analyse numerous reformulations of DRSO problems. They also show an asymptotic result explaining why the continuous robustification may not be suitable for discrete sampling spaces. More precisely, with the discrete robustification (our setting), when the radius $\kappa$ grows, Section 5 in that work shows that the likelihood robustified expectation (3.2) approaches the value of the worst-case scenario. This property, confirmed numerically in our case study, follows from observing that, for $\kappa$ sufficiently large, any probability is feasible for the max-operation in (2.4). This means, in particular, that the total probability of the worst scenario can be taken. The continuous robustification, by contrast, can far exceed the worst-case scenario. When the radius $\kappa$ grows, the considered ball would include any distribution for the cost realizations.

Lemma 3.1, inspired by a discussion in [Reference Noyan, Rudolf and Lejeune22], is the key to develop our new decomposition method, presented below.

4 Decomposition method for two-stage stochastic linear programs

When problem (2.1) is a two-stage stochastic linear program, its nonsmooth convex recourse functions

(4.1) \begin{equation}\mathfrak{Q}_s(x) = \inf \lbrace d_s^{\scriptscriptstyle \top} y : W_s y = h_s - T_s x, \quad y \geq 0 \rbrace\,, \end{equation}

can take the values $\pm \infty$ (given vectors $d_s\,, h_s$ and matrices $W_s\,, T_s$ , of appropriate dimensions).

Suppose, in addition, the first-stage feasible set is the polyhedron

\begin{equation*} \mathcal X:=\{x\geq 0: Ax=b\}\,,\end{equation*}

and the costs for both x and p are linear. Also, recall that a likelihood robustification ratio is the maximal distance from the nominal probability that the nearby probabilities used to robustify the expectation are allowed to be. In this setting, given a likelihood robustification ratio $\kappa \geq 0$ and using (3.2), problem (2.5) can be expressed as follows:

(4.2) \begin{align}\left\{\!\!\!\!\!\begin{array}{l@{\quad}l@{\quad}l@{\quad}l} & \underset{x, p, \tau, v}{\text{min}}& & C_x^{\scriptscriptstyle \top} x + C_p^{\scriptscriptstyle \top} p+ \sum_{r=1}^S p_r v_r + \kappa \tau \\[5pt] & \text{s.t.}& & x \geq 0, \quad Ax = b, \quad p \in \mathcal{P}, \\[5pt] & & & v_r \geq \mathfrak{Q}_s(x) - \tau \delta^{{\mathbb D}}_{rs} \quad \forall s,r \in {S}, \\[5pt] & & & \tau \geq 0. \end{array}\right.\end{align}

When $\mathcal{P} = \lbrace p \rbrace$ , this is a well-known convex two-stage stochastic programming problem, for which useful solution procedures based on scenario decomposition are available, including the ones in [Reference Fàbián and Szöke10] and [Reference Oliveira, SagastizÁbal and Scheimberg23].

When the set $\mathcal{P}$ is not a singleton and p becomes a variable, the bilinear terms $p_r v_r$ in the objective function lead to clear computational difficulties. The format of $\mathcal{P}$ is left unspecified for the moment. The only requirement is that it should be a simple convex set, for instance the convex hull of certain probability vectors taken as reference, so that the feasible set in (4.2) remains convex.

Before proceeding further, as announced before (3.1), we discuss possible choices for the transportation costs. A first option is to take

(4.3) \begin{equation}\delta^{{\mathbb{D}}}_{rs} = 1 \text{ if}\; r \neq s, \text{and}\; \delta^{{\mathbb D}}_{\text{rs}} = 0\; \text{otherwise.}\end{equation}

The corresponding measure $\Delta(p,q)$ is the total variation distance between the probabilities p and q. This is the distance used in our case study. Note that, because the total variation distance fails to have $\delta_{rr} > 0$ , it is not a proper distance. However, this does restrict the applicability of (3.2), since the values of $\delta_{rs}$ play no role on validity of the duality employed to obtain (3.2).

For other options, denote all the uncertain data of the problem by

\begin{equation*}\xi_s := (d_s, W_s, h_s, T_s)\,,\end{equation*}

and notice that this random variable determines the value for $\mathfrak{Q}_s(x)$ , via the solution of the sth second-stage problem defined by (4.1). Then, we could take $\delta^{{\mathbb D}}_{rs}$ as

\begin{equation*}\mbox{$\delta^{{\mathbb D}}_{rs} = \| \xi_s - \xi_r \|_1$ or $\delta^{{\mathbb D}}_{rs}(x) = |\mathfrak{Q}_s(x) - \mathfrak{Q}_r(x)|$.}\end{equation*}

Notwithstanding, as in (3.2) the transportation cost appears in the constraints, the latter choice is not suitable in practice because it would yield a ddDR2SO problem with nonlinearities that are hard to deal with computationally.

Our solution procedure replaces the convex functions $\mathfrak{Q}_s(x)$ by lower-bounding cutting-plane models and uses global bilinear optimization software to solve a sequence of approximating master problems. The optimal values of these approximating problems estimate from below the true optimal value of the nonsmooth and nonconvex master problem (4.2).

For efficiency, it is crucial to compute optimality and feasibility cuts for the recourse functions $\mathfrak{Q}_s(x)$ , as introduced by [Reference Van Slyke and Wets30]. In what follows, we explain how to compute these cuts. For a fixed first-stage decision $x_k \in \mathcal{X}$ , we can solve the problem defining $\mathfrak{Q}_s(x_k)$ , concluding either that $\mathfrak{Q}_s(x_k) \in \lbrace \pm\infty \rbrace$ or obtaining a solution together with an associated Lagrange multiplier for the equality constraints. Each case is explained below.

Having computed optimality and feasibility cuts for the functions $\mathfrak{Q}_s(x)$ , we are in position to define our sequence of lower-bounding approximating master problems to (4.2).

We assume given disjoint sets of optimality indices $O_{sk}$ and feasibility indices $F_{sk}$ such that

\begin{equation*}O_{sk} \cup F_{sk} = \lbrace 1, \ldots, k \rbrace,\end{equation*}

as well as the history of first-stage iterates and associated Lagrange multipliers of the second-stage problems,

\begin{equation*}\lbrace x_1, \ldots, x_k \rbrace\mbox{ and }\lbrace \lambda_{si}=\lambda_s(x_i) : s\in\lfloor {S}\rfloor, \quad i\in\lfloor {k}\rfloor \rbrace\,.\end{equation*}

The sets $O_{sk}$ and $F_{sk}$ are used to organise the information for the generated cuts, noting that knowing if $\lambda_{sk}$ refers to a feasibility or optimality cut depends on the value of $\mathfrak{Q}_s(x_k)$ . The kth approximate master problem is

(4.5) \begin{align} \left\{\begin{array}{l@{\quad}l@{\quad}l@{\quad}l}& \underset{x, p, \tau, v}{\text{min}}& & c^{\scriptscriptstyle \top} x + \sum_{r=1}^S p_r v_r + \kappa \tau \\[5pt]& \text{s.t.}& & x \geq 0, \quad Ax = b, \quad p \in \mathcal{P}, \quad \tau \geq 0, \\[5pt]& & & v_r \geq \mathfrak{Q}_s(x_i) - \lambda_{si} T_s^{\scriptscriptstyle \top} (x - x_i) + d_s - \tau \delta^{{\mathbb D}}_{rs} \quad \forall s,r \in {S} \quad i \in O_{sk}, \\[5pt]& & & 0 \geq U_s(x_i) - \lambda_{si} T_s^{\scriptscriptstyle \top} (x - x_i) \quad \forall s \in {S} \quad \forall i \in F_{sk}, \\[5pt]& & & \|x\|_\infty, \|p\|_\infty, \|\tau\|_\infty, \|v\|_\infty \leq M. \end{array}\right.\end{align}

As stated, the nonconvexity of the model comes from the terms $p_r v_r$ . Additional box constraints are commonly used in cutting-plane methods with a sufficiently large constant $M > 0$ . With those constraints the approximate master problem (4.5) always has a solution (which is not guaranteed using only optimality and feasibility cuts, especially at the initial iterations when the cuts for a given k represent poorly the true recourse functions $\mathfrak{Q}_s(x)$ ).

The solution procedure itself consists in defining a stopping test and rules to manage the sets $O_{sk}$ and $F_{sk}$ , and computing the sequence of iterates converging to a solution of (4.2). Thanks to the well-known lower-bounding property of (4.5), which amounts to satisfaction of the subgradient inequality (4.4) for convex functions, we shall be able to measure an estimate for the gap, as long as the solver computes a global solution of (4.5) using bilinear global optimization. Putting all this information together, using standard arguments, it is easy to show that the following algorithm approximates arbitrarily well a global solution to our ddDR2SO problem (4.2).

From a computational burden standpoint, the algorithm proposed also has some advantages if we compare the number of bilinear terms present in (4.5) with the number of bilinear terms in the deterministic equivalent associated with (4.2) if we drop the robustification terms and set $v_r = \mathfrak{Q}_r(x)$ directly in the objective. In this case, our algorithm solves many problems with a smaller number of bilinear terms, while the deterministic equivalent would solve one large problem with many bilinear terms.

General purpose solvers might fail if used with out-of-the-box configurations. Therefore, it is important to investigate if proper configurations can help the solver at hand. Problem (4.5) can be solved with the package BARON by [Reference Tawarmalani and Sahinidis28], provided some attention is given to details. General hints on how this was achieved in our case study are listed below:

  1. 1. It is important to warm-start the calculations using previous iterates.

  2. 2. Constraints in (4.5) are convex, a fact that should be informed to BARON.

  3. 3. The branching priorities should be adjusted experimentally. For instance, there is no need to branch in $x \in \mathcal{X}$ .

  4. 4. We obtained better performance relaxing the default stopping tolerances. The default absolute and relative tolerances of BARON can be too tight for the application at hand, which might make the solver branch over ‘numerical trash’.

5 Case study

As mentioned, we are interested in capacity expansion problems leading to sustainable energy systems, generating mostly fully renewable power. We consider a modification of the model in [Reference Birge and Louveaux5, Section 1.3] that represents Europe for the horizon 2020–2050, using ambiguity sets for the probabilities of the uncertain market conditions.

The notation is the following. Investments are planned for time steps $t \in \lfloor {T}\rfloor$ , deciding how much to invest for each type of technology indexed by $i\in\lfloor {I}\rfloor$ . The new amount of technology i, made available at time t, is denoted by $x_i^t$ . The accumulated capacity of technology i at time t is denoted by $w_i^t$ . The cost to install $x_i^t$ is $c_i^t x_i^t$ , and the maintenance cost of the accumulated production capacity $w_i^t$ is $\eta_i^t w_i^t$ . A technology i decided at time t takes $B^t_i$ years to build and has a lifetime of $L^t_i$ .

Regarding generation, costs are uncertain because they depend on uncertain market conditions. The cost of generating electricity given the installed capacity w at scenario $s\in\lfloor {S}\rfloor$ is denoted by $\mathfrak{Q}_s(w)$ . In the original version of [Reference Birge and Louveaux5], each scenario s is endowed with an exogenous probability $p_s$ .

The decision problem is stated below, where $w_i^0 = 0$ and $x_i^t = 0$ for $t < 0$ :

(5.1) \begin{align}\left\{\!\!\!\!\begin{array}{l@{\quad}l@{\quad}l@{\quad}l}& \underset{x, w}{\text{min}}& & \sum_{t=1}^T \sum_{i=1}^I c_i^t x_i^t + \sum_{t=1}^T \sum_{i=1}^I \eta_i^t w_i^t + \sum_{s=1}^S p_s \mathfrak{Q}_s(w) \\[5pt]& \text{s.t.}& & w_i^t = w_i^{t-1} + x_i^{t- B^t_i} - x_i^{t- B^t_i - L^t_i}, \quad x, w \geq 0.\end{array} \right.\end{align}

To understand the contribution of ambiguity sets for the probabilities, note first that, because making decisions in real-life in the long term is hard, the investor cannot assign a single distribution of probability to the uncertain events in the problem. Rather, the investor prefers to compute what would be the best possible cost, taking into account that the plausible probability distributions lie in some convex region $\mathcal{P}$ . The goal of the DRSO problem is to check if in the best case the investment return is stable relative to the probabilities chosen. This is a necessary condition when investing in risky assets, especially in the long term.

After careful consideration of the political and technological situations, the decision maker defines some reference probability vectors for the outcomes of operational costs:

\begin{equation*}\left\{p^1, \ldots, p^L\right\}\subset (0,1)^S\end{equation*}

(each $p^l$ is a different probability distribution in $\mathbb{R}^S$ ). For this finite number of probabilities, the investor knows that it is enough to solve L different instances of problem (5.1) and compute the minimum. However, for a more comprehensive analysis, it can be preferable to take decisions considering an infinite number of probabilities, for instance by setting

(5.2) \begin{equation}\mathcal{P} = \text{conv} \left\{p^1, \ldots, p^L\right\}\subset (0,1)^S\,.\end{equation}

Introducing the simplicial variables $\alpha_l$ for $l\in\lfloor {L}\rfloor$ , this gives the following nonsmooth and nonconvex problem, with decision-dependent probabilities:

\begin{align*}\left\{\!\!\!\! \begin{array}{l@{\quad}l@{\quad}l@{\quad}l}& \underset{x, w, \alpha}{\text{min}}& & \sum_{t=1}^T \sum_{i=1}^I c_i^t x_i^t + \sum_{t=1}^T \sum_{i=1}^I \eta_i^t w_i^t +\sum_{s=1}^S \left\lbrace \sum_{l=1}^L\alpha_l p_s^l \right\rbrace \mathfrak{Q}_s(w) \\[5pt]& \text{s.t.}& & w_i^t = w_i^{t-1} + x_i^{t- B^t_i} - x_i^{t- B^t_i - L^t_i}, \quad \sum_{l=1}^L \alpha_l = 1, \quad x, w, \alpha \geq 0. \end{array} \right.\end{align*}

We next explain how the likelihood robustification radius $\kappa \geq 0$ comes into play. Having the mentioned stability goal in mind, our investor wants also to find out how the cost would change if instead of best-case scenarios, also a worst-case scenario were to be considered. However, not to exaggerate on the conservatism, this should be done in a continuous manner, for different degrees of pessimism, depending on the value of the parameter $\kappa$ . When null, $\kappa = 0$ represents no pessimism, while $\kappa = \infty$ represents total pessimism. Ultimately, because the plausible probabilities taken as reference may be slightly inaccurate, the investor wants to know how stable the decisions are with respect to the probabilities assigned to the events. For this case study, selecting probabilities comes at no cost, so $C_p \equiv 0$ and the ddDR2SO formulation (4.2) for problem (5.1) is given by

\begin{align*}\left\{\!\!\!\! \begin{array}{l@{\quad}l@{\quad}l@{\quad}l}& \underset{x, w, \alpha, v, \tau}{\text{min}}& & \sum_{t=1}^T \sum_{i=1}^I c_i^t x_i^t + \sum_{t=1}^T \sum_{i=1}^I \eta_i^t w_i^t + \sum_{s=1}^S \left\lbrace \sum_{l=1}^L\alpha_l p_s^l \right\rbrace v_s + \kappa \tau \\[5pt]& \text{s.t.}& & w_i^t = w_i^{t-1} + x_i^{t- B^t_i} - x_i^{t- B^t_i - L^t_i}, \quad \sum_{l=1}^L \alpha_l = 1, \quad x, w, \alpha, \tau \geq 0, \\[5pt] & & & v_s \geq \mathfrak{Q}_r(w) - \tau \delta^{{\mathbb D}}_{rs} \quad s,r\in S \,. \end{array}\right. \end{align*}

The recourse functions defining the values $\mathfrak{Q}_s(w)$ correspond to second-stage generation problems, for the accumulated capacity w, given the market condition s. Specifically, a time index $t\in\lfloor {T}\rfloor$ is divided into m modes of operation. A mode of operation at time t is characterised by a duration $\tau_{j}^t$ and a demand $d_{sj}^t$ for electricity. The cost of generating energy with technology i and time t is $d_{si}^t$ . For technology i at time t, there are deterministic values of existing capacity $g_i^t$ and decommissioned amounts $u_i^t$ . In scenario s, the capacity $g_i^t$ is affected by a stochastic availability factor $A_{sij}^t$ . The demand $D_{sj}^t$ of electricity is satisfied with the generation $y_{sij}^t$ , produced by technology i, time t, scenario s and mode of operation j. Altogether, this gives the recourse function below:

\begin{align*}\mathfrak{Q}_s(w):= \left\{\!\!\!\! \begin{array}{l@{\quad}l@{\quad}l@{\quad}l}& \underset{y}{\text{min}}& & \sum_{t=1}^T \sum_{i=1}^I \sum_{j=1}^m d_{si}^t \tau_{j}^t y_{sij}^t \\[5pt] & \text{s.t.}& & \sum_{i=1}^I y_{sij}^t = D_{sj}^t, \quad y_{sij}^t \leq A_{sij}^t (g_i^t - u_i^t + w_i^{t}), \quad y \geq 0. \end{array}\right.\end{align*}

Let us now explain the practical matters of the problem. We use the model above to make a simplified investment planning for Europe with yearly decisions from 2020 to 2050 using data from [Reference del Granado, Skar, Doukas and Trachanas13] for costs, existing installed capacity, yearly demand, lifetimes and building times. The technologies considered are coal-based, solar PV and wind onshore. The parameters for each technology, as well as their projected demand, are shown in Tables 15.

Table 1. Investment costs (Euro/KWh)

Table 2. Maintenance costs (Euro/KWh)

Table 3. Variable generation costs (Euro/KWh)

Table 4. Building time (Years)

Table 5. Aggregate demand (TW/h)

In general, stochastic optimization is used to achieve the diversification effect for the portfolio of electricity generation equipment. However, due to our simplified modelling, this diversification does not appear naturally. For this purpose, the complementarity between wind and solar generation is built into the model using m different modes of generation. More precisely, each time period has two modes, one only for wind (60%) and another for solar (40%). The mode duration is taken proportionally. The availability factor for wind has mean $0.35$ and for solar has mean $0.5$ . Both have variance equal to 50% of the mean. The mean for the demand is reported in Table 5, and its variance is 30% of the mean. The scenarios are generated sampling the respective quantities using a normal distribution, $\mathcal{N}$ , with the corresponding mean and variances.

Also because of the simplified modelling, in the master problem, we have to add a linear constraint limiting the investment at each time period to at most 12% of total installed capacity. The lifetime for coal-based generation is 30 years, while for wind onshore it is 20 years, and for solar PV is 25 years. The decommissioning rate is assumed to be linear so that at the end of the period all thermal power plants are decommissioned. The initial installed capacity of coal-based power is 0.92 TW/h, and the other ones are assumed to be zero.

As mentioned, we use the total variation distance for the transportation costs, as in (4.3). All experiments are run on a notebook with Intel i7 1.90GHz processor, running under Ubuntu 18.04.3 LTS, and using CPLEX 12.10 and Julia 1.1.1; see [Reference Lubin and Dunning19]. The global bilinear solver offered by BARON needs a mixed-integer solver to be provided, which in our case, is CPLEX 12.10.

Below, results are reported only until 2040 because, as often in this setting, the model exhibits abnormal behaviour when approaching the end of the horizon.

As explained as the end of Section 3, the likelihood robustified expectation (3.2) approaches the value of the worst case scenario for sufficiently large $\kappa$ . For this reason, we computed the min-max expansion planning for $S=30$ scenarios, which makes the expansion that minimises the highest cost of any scenario. The min-max expansion planning is shown in Figure 1. This output, known to be cost-conservative, is used as a reference for comparison. We expect the (less conservative) ddDR2SO variants to make more investments.

Figure 1. Resulting mix with the min-max expansion.

The comparison is done with the expansions obtained with the following two sets $\mathcal{P} $ as in (5), with

\begin{equation*}\begin{array}{rclll}\mathcal{P} = \left\{ \left(\frac{1}{S}, \dots, \frac{1}{S}\right) \right\}, & \mbox{denoted by Simple, and}\\[5pt] \mathcal{P} = \text{conv} \lbrace p_1, \dots, p_L \rbrace, & \mbox{denoted by Opt-robust}\,.\end{array}\end{equation*}

The probabilities $p_l$ used for the Opt-robust variant are obtained as perturbations to the $\left({1 \over S}, \dots, {1 \over S}\right)$ vector, which are normalised afterwards so that the entries of the perturbed vectors sum to one. Note that for a long-term investiment problem, like the one we have, basically any set of probabilities we come up with is arguably fictitious. The resulting mixes, obtained by varying the likelihood robustification radius $\kappa \in\{0, 0.1, 0.2\}$ , are given in Figures 2, 3, and 4, respectively. All experiments take less than one minute to finish, using the decomposition method.

Figure 2. Mix with the likelihood robustification radius $\kappa =0$ .

Figure 3. Mix with the likelihood robustification radius $\kappa =0.1$ .

Figure 4. Mix with the likelihood robustification radius $\kappa =0.2$ .

As is seen in Figure 2, the strategy opt-robust makes less investments than the simple, but more investments than the min-max. This is expected from the definitions of the strategies themselves. Notice also that, in all the runs, the amount of coal-based generation decreases linearly due to decommissioning, and because it is not cost-effective.

As the ratio $\kappa$ increases, we observe that the alternative strategies get closer to the min-max, shown in Figure 1. This is clear for the opt-robust variant in Figure 3, with $\kappa=0.1$ , even though the investment with opt-robust remains higher than with min-max. The simple strategy did not change with respect to $\kappa=0$ .

For $\kappa=0.2$ both simple and opt-robust strategies approach the min-max solution in Figure 3. We believe that the good agreement with the min-max strategy, clear already for small $\kappa$ , is due to the choice of ambiguity set $\mathcal{P}$ , which is narrow both for the simple and opt-robust variants. For other choices of $\mathcal{P}$ , the behaviour is likely to be different.

Notice that Figures 24 are for ‘in-sample’ uncertainty. They illustrate how the investment model progressively replaces coal-based technologies along the time horizon. An ‘out-of-sample’ performance could be obtained by making a statistical analysis of the production costs required to attend demand, using another set of scenarios. This is a typical construct in stochastic programming. However, for electricity markets, generation cost is not the only indicator to take into account. Many other factors need to be evaluated, such as system reliability, satisfactory behaviour of ancillary services, even the overall carbon footprint. Given the complexity of such analysis, we chose not to perform out-sample analyses (any conclusion not taking into account all the involved aspects could be deemed artificial). Notwithstanding an interesting feature that can be noted in Figures 24 is that the investment profile (amount built each year) is stable across the years, with more dependency on probabilities only at the end of the horizon.

Figure 5, with the evolution of costs as a function of $\kappa$ , reveals the opt-robust option as the most expensive one, except for $\kappa = 0$ . The ambiguity set just affects the probabilities for small $\kappa$ . For large $\kappa$ , the event with the worst outcome receives total probability, as shown by [Reference Noyan, Rudolf and Lejeune22].

Figure 5. The evolution of costs as a function of $\kappa$ .

Additionally, notice in Figure 5 the smooth behavior of the cost when the robustification ratio $\kappa$ varies. This confirms the expectation that our approach brings some stability to the decision-making process (with respect to variations in the probability distribution defining the problem).

To conclude, we examine whether our decomposition algorithm improves solution times. The advantage of decomposition is that the sequence of problems solved has either a much smaller number of bilinear terms or a much smaller number of variables. As illustrated by [Reference van Ackooij, Frangioni and de Oliveira29], decomposition methods, which typically scale well, have a strong advantage when solution times of combinatorial or nonconvex problems increase dramatically with the dimension of the problem. This is confirmed by our results in the tables below. Another remark is that, since BARON uses a nonlinear programming solver in the local search procedure, issues related to precision, robustness and time tend to affect the deterministic equivalent of the problem more strongly as the number of variables grows. In Tables 6 and 7, we show statistics for the solution times varying the number of scenarios with ten probabilities ( $L=10$ ) and some values of $\kappa$ . Each experiment is repeated four times. The structure of the problem is the same for all values of $\kappa$ , and therefore, the decomposition method works just the same because the solution times of the nonconvex master problem dominates the execution.

In summary, our likelihood robustified optimistic problem provides more conceptual flexibility than the min-max (mimization of the highest realization of the cost) and min-max regret (minimization of largest deviation of the best cost for each scenario) approaches. This should provide better out-of-sample operational performance on a given set of scenarios, after the expansion plan is decided. In practice, the value of $\kappa$ can be selected having these remarks in mind, besides also being useful to check for the stability of the current decisions. Specifically in our case, the more technology is built, the smaller the out-of-sample cost to meet the demand (because less slack generation needs to be activated, as the dispatch follows the order of merit). In other words, the belief on ambiguity sets of probabilities determines how much technology should be built, and the more technology built the less the out-of-sample operational cost.

6 Final comments

Generating power with minimal carbon footprint over an horizon of 30 years is the goal established by many countries worldwide, especially in Europe. To this aim, it is important to design capacity expansion models that take into account both the progressive decommissioning of fuel-fosil technologies and the uncertainty brought into the system by the intermittency of wind and solar power generation. To address these issues, we propose a new approach for decisions under probability ambiguity placed between the well-known optimistic and pessimistic paradigms. The interest of our approach is illustrated by a case study whose numerical solution relies on scenario-wise decomposition to solve the resulting nonsmooth and nonconvex problem to global optimality.

Table 6. Comparison of solution times (in seconds) of the decomposition method and the full deterministic equivalent of the problem, for $L=10$ and $\kappa=0.1$

Table 7. Comparison of solution times (in seconds) of the decomposition method and the full deterministic equivalent of the problem, for $L=10$ and $\kappa=0.2$

Sensitivity with respect to probabilities can be mitigated by a large volume of data. However, if the underlying planning task is of long-term nature, like capacity and transmission expansion in energy systems, this issue becomes more important, as changes in the probability values induce qualitative changes of the decisions suggested by planning models. With this perspective, trying to identify nearby problem data that would change the qualitative aspect of the decisions is appealing (say, to determine if a change in the probabilities drives a profitable activity to a non-profitable one). This type of analysis can be done with our proposed framework, by means of the ambiguity sets.

Strategies based on min-max and min-max regret are often used in the capacity and transmission expansion planning of power systems, for instance in [Reference del Granado, Skar, Doukas and Trachanas13]. Those measures are well known for their conservatism, but they do not consider robustness with respect to probability distributions. Ambiguity sets, endowed with the technique introduced in this work for likelihood robustification, could be used to robustify the decision-making process of models based on min-max, min-max regret, CVaR, and other risk measures.

The use of convex combinations of probability distributions was employed in [Reference Hellemo, Barton and Tomasgard15] for models where the parameters of probability distributions can be influenced by investment decisions. Models with endogenous uncertainty are usually nonconvex and their computational solution might be time consuming or unreliable. Both the ddDR2SO model (4.2) and the decomposition Algorithm 2 remain applicable if the robustification parameter or the probability set depends on the first-stage variable, as long as the latter remains a convex set (in (4.2) $\kappa=\kappa(x)$ or in (5.2) $\mathcal{P}=\mathcal{P}(x)$ ).

When compared to the settings discussed by [Reference Noyan, Rudolf and Lejeune22] and [Reference Kuhn, Esfahani, Nguyen and Shafieezadeh-Abadeh17], our main algorithmic contribution is in exploiting the specific structure of the two-stage problem to design an efficient decomposition method. Finally note that, while most of the numerical schemes for distributionally robust optimization entail the solution of convex models, our approach is nonconvex but remains computationally tractable, thanks to decomposition.

Finally, note that it would be possible to deal with discrete distributions with an unequal number of scenarios, by assigning probability zero to ‘excess’ scenarios. For example, if one distribution has outcomes $\lbrace 1, 2 \rbrace$ and another has outcomes $\lbrace 1 \rbrace$ , we can just consider that the second distribution also has outcome ‘2’ with a very small (or zero) probability assigned to it.

Acknowledgments

This work was supported by the TACEMM project (Trans-Atlantic Cooperation on Energy Market Models). Research of the second author is partly funded by CNPq Grant 306089/2019-0, CEPID CeMEAI, and FAPERJ in Brazil. The third author is supported by CNPq Grant 303913/2019-3, FAPERJ Grant E-26/202.540/2019, and PRONEX–Optimization.

References

Bayraksan, G. & Love, D. K. (2015) Data-driven stochastic programming using Phi-divergences. In: The Operations Research Revolution, INFORMS, pp. 119. https://doi.org/10.1287%2Feduc.2015.0134 Google Scholar
Ben-Tal, A., Ghaoui, L. E. & Nemirovski, A. (2009) Robust Optimization, Princeton University Press, Princeton, NJ.10.1515/9781400831050CrossRefGoogle Scholar
Ben-Tal, A. & Teboulle, M. (1987) Penalty functions and duality in stochastic programming via $\phi$ -divergence functionals. Math. Oper. Res. 12(2), 224240.CrossRefGoogle Scholar
Bertsimas, D., Doan, X. V., Natarajan, K. & Teo, C.-P. (2010) Models for minimax stochastic linear optimization problems with risk aversion. Math. Oper. Res. 35(3), 580602.10.1287/moor.1100.0445CrossRefGoogle Scholar
Birge, J. R. & Louveaux, F. (1997). Introduction to Stochastic Programming, Springer-Verlag, Berlin/Heidelberg, Germany.Google Scholar
Calafiore, G. C. (2007) Ambiguous risk measures and optimal robust portfolios. SIAM J. Optim. 18(3), 853877.CrossRefGoogle Scholar
Delage, E. & Ye, Y. (2010) Distributionally robust optimization under moment uncertainty with application to data-driven problems. Oper. Res. 58(3), 595612.CrossRefGoogle Scholar
Dempe, S., Dutta, J. & Mordukhovich, B. S. (2007) New necessary optimality conditions in optimistic bilevel programming. Optimization 56(5–6), 577604.CrossRefGoogle Scholar
Esfahani, P. M. & Kuhn, D. (2017) Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations. Math. Program. 171(1–2), 115166.CrossRefGoogle Scholar
Fàbián, C. I. & Szöke, Z. (2006) Solving two-stage stochastic programming problems with level decomposition. Comput. Manag. Sci. 4(4), 313353.CrossRefGoogle Scholar
Goel, V. & Grossmann, I. E. (2004) A stochastic programming approach to planning of offshore gas field developments under uncertainty in reserves. Comput. Chemical Eng. 28(8), 14091429.CrossRefGoogle Scholar
Goh, J. & Sim, M. (2010) Distributionally robust optimization and its tractable approximations. Oper. Res. 58(4-part-1), 902917.10.1287/opre.1090.0795CrossRefGoogle Scholar
del Granado, P., Skar, C., Doukas, H. & Trachanas, G. P. (2018) Investments in the EU power system: A stress test analysis on the effectiveness of decarbonisation policies. In: Understanding Risks and Uncertainties in Energy and Climate Policy, Springer International Publishing, New York, pp. 97122. https://doi.org/10.1007%2F978-3-030-03152-7_4 Google Scholar
Hanasusanto, G. A. & Kuhn, D. (2018) Conic programming reformulations of two-stage distributionally robust linear programs over Wasserstein balls. Oper. Res. 66(3), 849869.10.1287/opre.2017.1698CrossRefGoogle Scholar
Hellemo, L., Barton, P. I. & Tomasgard, A. (2018) Decision-dependent probabilities in stochastic programs with recourse. Comput. Manag. Sci. 15(3–4), 369395.CrossRefGoogle Scholar
Jonsbråten, T. W., Wets, R. & Woodruff, D. L. (1998) A class of stochastic programs with decision dependent random elements. Ann. Oper. Res. 82, 83106.CrossRefGoogle Scholar
Kuhn, D., Esfahani, P. M., Nguyen, V. A. & Shafieezadeh-Abadeh, S. (2019). Wasserstein distributionally robust optimization: theory and applications in machine learning. In: Operations Research & Management Science in the Age of Analytics, INFORMS, pp. 130166. https://doi.org/10.1287%2Feduc.2019.0198 CrossRefGoogle Scholar
Love, D. & Bayraksan, G. (2013) Two-stage likelihood robust linear program with application to water allocation under uncertainty. In: 2013 Winter Simulations Conference (WSC), IEEE, Hoboken, NJ. https://doi.org/10.1109%2Fwsc.2013.6721409 CrossRefGoogle Scholar
Lubin, M. & Dunning, I. (2015) Computing in operations research using Julia. INFORMS J. Comput. 27(2), 238248.10.1287/ijoc.2014.0623CrossRefGoogle Scholar
Luna, J. P., SagastizÁbal, C. & Silva, P. J. S. (2021) A discussion on electricity prices, or the two sides of the coin. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 379(2202), 0428.Google Scholar
Nohadani, O. & Sharma, K. (2018) Optimization under decision-dependent uncertainty. SIAM J. Optim. 28(2), 17731795.CrossRefGoogle Scholar
Noyan, N., Rudolf, G. & Lejeune, M. (2018) Distributionally Robust Optimization with Decision-Dependent Ambiguity Set. Tech. rep. Optimization Online.Google Scholar
Oliveira, W., SagastizÁbal, C. & Scheimberg, S. (2011) Inexact bundle methods for two-stage stochastic programming. SIAM J. Optim. 21(2), 517544.CrossRefGoogle Scholar
Pflüg, G. & Wozabal, D. (2007) Ambiguity in portfolio selection. Vol. 7, 4. Informa UK Limited, pp. 435442.10.1080/14697680701455410CrossRefGoogle Scholar
Pflüg, G. C., Pichler, A. & Wozabal, D. (2012) The 1/N investment strategy is optimal under high model ambiguity. J. Banking Finance 36(2), 410417.CrossRefGoogle Scholar
Postek, K., den Hertog, D. & Melenberg, B. (2016) Computationally tractable counterparts of distributionally robust constraints on risk measures. SIAM Rev. 58(4), 603650.10.1137/151005221CrossRefGoogle Scholar
Sagastizábal, C. (2012). Divide to conquer: decomposition methods for energy optimization. Math. Program. 134(1), 187222.CrossRefGoogle Scholar
Tawarmalani, M. & Sahinidis, N. V. (2005) A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225249.CrossRefGoogle Scholar
van Ackooij, W., Frangioni, A. & de Oliveira, W. (2016) Inexact stabilized Benders’ decomposition approaches with application to chance-constrained problems with finite support. Comput. Optim. Appl. 65(3), 637669.CrossRefGoogle Scholar
Van Slyke, R. M. & Wets, R. (1969) L-shaped linear programs with applications to optimal control and stochastic programming. SIAM J. Appl. Math. 17(4), 638663.CrossRefGoogle Scholar
Wang, Z., Glynn, P. W. & Ye, Y. (2015) Likelihood robust optimization for data-driven problems. Comput. Manag. Sci. 13(2), 241261.CrossRefGoogle Scholar
Wiesemann, W., Tsoukalas, A., Kleniati, P.-M. & Rustem, B. (2013) Pessimistic bilevel optimization. SIAM J. Optim. 23(1), 353380.CrossRefGoogle Scholar
Wozabal, D. (2010) A framework for optimization under ambiguity. Ann. Oper. Res. 193(1), 2147.CrossRefGoogle Scholar
Xie, W. (2020) Tractable reformulations of two-stage distributionally robust linear programs over the type- $\infty$ Wasserstein ball. Oper. Res. Lett. 48(4), 513523.CrossRefGoogle Scholar
Figure 0

Table 1. Investment costs (Euro/KWh)

Figure 1

Table 2. Maintenance costs (Euro/KWh)

Figure 2

Table 3. Variable generation costs (Euro/KWh)

Figure 3

Table 4. Building time (Years)

Figure 4

Table 5. Aggregate demand (TW/h)

Figure 5

Figure 1. Resulting mix with the min-max expansion.

Figure 6

Figure 2. Mix with the likelihood robustification radius $\kappa =0$.

Figure 7

Figure 3. Mix with the likelihood robustification radius $\kappa =0.1$.

Figure 8

Figure 4. Mix with the likelihood robustification radius $\kappa =0.2$.

Figure 9

Figure 5. The evolution of costs as a function of $\kappa$.

Figure 10

Table 6. Comparison of solution times (in seconds) of the decomposition method and the full deterministic equivalent of the problem, for $L=10$ and $\kappa=0.1$

Figure 11

Table 7. Comparison of solution times (in seconds) of the decomposition method and the full deterministic equivalent of the problem, for $L=10$ and $\kappa=0.2$