Gradient estimation for smooth stopping criteria

Bernd Heidergott; Yijie Peng

doi:10.1017/apr.2022.7

Gradient estimation for smooth stopping criteria

Part of: Markov processes Classical measure theory

Published online by Cambridge University Press: 15 June 2022

Bernd Heidergott

and

Yijie Peng

Show author details

Bernd Heidergott*: Affiliation:
Vrije Universiteit Amsterdam
Yijie Peng*: Affiliation:
Peking University
*: *Postal address: Department of Operations Analytics, De Boelelaan 1105, 1081 HV Amsterdam. Email address: [email protected]
**Postal address: Guanhua School of Management, 52 Haidian Rd, Beijing. Email address: [email protected]

Article contents

Abstract
Introduction
Differentiation of one-step taboo kernel
Differentiation of the potential kernel
Estimation procedures
Applications
Conclusion
Competing interests
References

Rights & Permissions

Abstract

We establish sufficient conditions for differentiability of the expected cost collected over a discrete-time Markov chain until it enters a given set. The parameter with respect to which differentiability is analysed may simultaneously affect the Markov chain and the set defining the stopping criterion. The general statements on differentiability lead to unbiased gradient estimators.

Keywords

Sensitivity analysis Monte Carlo simulation gradient estimation

MSC classification

Primary: 28A15: Abstract differentiation theory, differentiation of set functions

Secondary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)

Type: Original Article
Information: Advances in Applied Probability , Volume 55 , Issue 1 , March 2023 , pp. 29 - 55

DOI: https://doi.org/10.1017/apr.2022.7 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Consider a Markov chain $ \{ X_n \} $ with homogeneous transition kernel $ P_\theta $ on a general state-space supported on $\mathbb{R} $ . Moreover, we equip $\mathbb{R}$ with the usual topology, and, if not stated otherwise, measurability is understood with respect to the Borel field. Let $ A_\theta \subset \mathbb{R} $ denote a measurable set, and let $ \tau $ denote the first entry time into $ A_\theta $ when starting the chain in $x_0$ , i.e.,

\begin{equation*}\tau \,:\!=\, \tau (x_0) = \inf \{ n \geq 1 \,:\, X_n \in A_\theta\! \} ,\end{equation*}

with $ X_0=x_0$ , where we set $ \tau = \infty $ if the set on the right-hand side is empty. For technical convenience we assume throughout the paper that $ x_0 \not \in A_\theta $ . We consider the problem of estimating

(1)

\begin{align}\frac{d}{d \theta } \mathbb{E} \left.\left[ \sum_{ i=0}^{\tau-1}h( X_{i} ) \right| X_0 = x_0 \right] ,\end{align}

for some measurable cost function $ h \,:\, \mathbb{R} \mapsto \mathbb{R}$ and $ x_0 \not \in A_\theta $ . Generally speaking, the problem arises naturally in analysing a system, modelled by $ P_\theta $ , that is operating until a certain event occurs, here modelled by $ X_n $ entering $ A_\theta $ , upon which some action is taken. For example, in inventory problems, orders are placed once the stock drops below a certain value; in maintenance, preventive maintenance is performed once the age of a component exceeds some threshold; and in service control, the speed of the server may be adjusted in case observed waiting times exceed some quality threshold.

In analysing a model we are interested not only in evaluating its performance but also in improving it. Let $ \theta $ be a control parameter of the model. For example, in an inventory model $ \theta $ may be the level that triggers the replenishing of stock. Changing the order level not only alters the cost, represented by h in the model, but may also have an impact on the demand, as having a low replenishing level may attract more demand. This illustrates that $ \theta $ may simultaneously affect the boundary of $ A_\theta $ , the instantaneous cost h, and the transition probability of the underlying process. In this paper, we will derive derivative expressions for this general case of $ \theta $ influencing all aspects of the model. As we will show, the overall formulas can be translated into efficient unbiased simulation-based estimators.

With the advent of simulation-driven perturbation analysis techniques, such as perturbation analysis [Reference Cao7, Reference Fu and Hu12, Reference Fu13, Reference Glasserman14, Reference Ho and Cao24], the score function method [Reference Rubinstein and Shapiro37], and weak differentiation [Reference Pflug34], efficient unbiased estimators for the expression in (1) have been found for particular instances of the problem. More specifically, for applications of perturbation analysis to (s, S) inventory models with $ \theta $ being either threshold s, S, or the span $ S-s $ , we refer to [Reference Bashyam and Fu5, Reference Bashyam and Fu4]. Alternatively, a variant of the score function method, called the push-out score function, has been developed for this kind of threshold optimization in inventory models; see [Reference Pflug and Rubinstein35], as well as [Reference Rubinstein36] for an early reference. For a perturbation analysis of a maintenance model where $ \theta $ triggers (preventive) maintenance of system components whose age exceeds $ \theta $ , we refer to [Reference Heidergott and Farenhorst-Yuan19, Reference Heidergott16]. Finally, for applications to financial models, where $ \theta $ is the value of a barrier in some option model, we refer to [Reference Glasserman15, Reference Lyuu and Teng31, Reference Heidergott17, Reference Heidergott, Leahu and Volk-Makarewicz21].

General results for the case where $ A_\theta $ and the instantaneous cost are independent on $ \theta $ , and only the underlying stochastic process depends on $ \theta $ , are provided in [Reference Heidergott and Vázquez-Abad22, Reference Heidergott and Vázquez-Abad23]. Moreover, in the setting of Markov chains with discrete state-space, sensitivity analysis of the expected return time to a specified marked state $i^\ast $ (also called time to absorption) with respect to the entries of the Markov transition probabilities is well-studied. Expressed in our setting, in these studies, we have $ h = 1 $ , $ A = \{ i^\ast \} $ contains the marked state, and the transition probability of the random walk $ X_n $ depends on $ \theta $ . See, for example, [Reference Caswell8, Reference Caswell9] for details.

Before laying out the main steps of our approach for providing an unbiased estimator for the derivative in (1), we provide examples illustrating the relevance of the extension to the case $ A_\theta $ . Without loss of generality, we consider $ \theta $ to be scalar, as for the multidimensional case $ \theta \in \mathbb{R}^k $ , $ k > 1 $ , estimation of a gradient can be reduced to that of estimating the partial derivatives separately.

Example 1. (Service quality.) Customers arrive at a service station according to a renewal point process. The interarrival times $ \{ I_n \,:\, n \in \mathbb{N} \}$ are independent and identically distributed (i.i.d.), with density $ f_I ( x ) $ such that $ 0< E [ I_n ] <\infty$ and $\mathbb{P} (I_n=0)=0$ . Customers are served in order of arrival, and consecutive service times are i.i.d. random variables $ \{ S_n( \theta ) \,:\, n \in \mathbb{N} \}$ with density $ f_S ( x ;\, \theta ) $ , for $ \theta \in \Theta \subset \mathbb{R}$ , such that $ 0< E [ S_n ( \theta) ] <\infty$ and $\mathbb{P} (S_n ( \theta ) =0)=0$ . Interarrival times and service times are assumed to be mutually independent. Consider the process of consecutive waiting times $\{X_n \}$ , denoting the time that the corresponding customer has spent at the service station from arrival to beginning of service. The service system starts initially empty. Consecutive waiting times $ X_n $ follow the recursive relation

(2)

\begin{equation}X_{n}= \max\{0, X_{n-1} + S_{n-1} ( \theta ) - I_{n}\} , \qquad n \geq 2 ,\end{equation}

and $X_1= 0$ . Letting $ A_\theta = [ \theta , \infty ) $ and $ h ( X_i) \equiv 1$ , with

(3)

\begin{equation}\tau = \inf \{ n \,:\, X_n \geq \theta \} , \quad \text{ for } \theta \geq 0 ,\end{equation}

we obtain from (1) the sensitivity of the expected index of the first customer who experiences a waiting time exceeding $ \theta$ .

Example 2. (Investment.) For an investment strategy, a portfolio of m risky assets is considered. The value of asset i after the nth dividend date is denoted by $ Z_{ n } (i) $ , and we let $ X_n = ( Z_n ( 1 ) , \ldots , Z_n ( m ) )$ , $ n\geq 0 $ . From a risk management perspective we are interested in the expected time until the value of the basket drops below $ \theta$ :

\begin{equation*}\tau_\theta = \inf \!\left \{ n \,:\, \sum_{ i=0 }^m Z_{n} (i) \leq \theta \right \} = \inf\! \left \{ n \,:\, h ( X_n ) \leq \theta \right \} ,\end{equation*}

for

\begin{equation*} h ( X_n ) = \sum_{ i=0 }^m Z_{n} (i) .\end{equation*}

The threshold $ \theta $ may be used in an investment strategy where the investor is only inclined to restructure the portfolio once the value of the basket drops below $ \theta $ .

Example 3. (Insurance.) Consider an insurance company where the nth claim $ W_n $ arrives at time $ t_n $ . For simplicity, we assume that there is a premium inflow of c per time unit. The capital at time t, denoted by K(t), is then given by

\begin{equation*}K ( t ) = K_0 + c t - \sum_{i=0}^{ N ( t ) -1 } W_i ,\end{equation*}

where $ \tau_ 0 = 0 $ , N(t) is the number of claims incurred up to time t, and $ K_0 $ is the initial capital. The time until ruin is given by

\begin{equation*}\inf \{ t \,:\, K ( t ) \leq 0 \} .\end{equation*}

The capital K(t) increases between claims, and the only possible time epochs where K(t) can fall below 0 are the jump epochs of N(t). Hence, letting $ X_n = ( \eta_n , W_n , \hat K_n)$ , with $ \eta_n $ the time between the nth claim and the $ (n+1)$ th, $ \hat K_n $ the capital right after the nth claim, and $ X_0 = ( 0 , 0 , K_0)$ , ruin occurs for $ \hat K_n < 0 $ . However, the insurance company will have to act before the zero-threshold is reached; in practice, the insurance company is interested in the time until the capital drops below a certain threshold $ \theta$ . Let

\begin{equation*}\tau_\theta = \inf\! \big\{ n \,:\, \hat K_n < \theta \big\}\end{equation*}

denote the index of the claim causing the capital to drop below $ \theta $ , and let $ \rho_n $ denote the nth jump time of N(t); then $ \rho_{ \tau_\theta} $ yields the time at which the capital drops below $ \theta $ .

For our analysis, we take a Markov chain operator approach. In the following we focus on motivating our approach; formal definitions are postponed to Section 2.1. Specifically, let $ P_\theta $ denote the Markov kernel of $ \{ X_n \} $ . For analysing (1) we make use of the concept of a taboo kernel. More specifically, the taboo kernel of $ P_\theta $ is a truncation of $ P_\theta $ onto the complement of taboo set $A_\theta$ , defined by

\begin{equation*}[ P_\theta]_{A_\theta} ( B ;\,x ) = \left \{\begin{array}{l@{\quad}r} P_\theta \big( B\cap A^c_\theta, x \big) , & \: B \cap A^c_\theta \not= \emptyset ,\\[4pt] 0 & \: \mbox{ otherwise.}\end{array} \right .\end{equation*}

for all measurable sets B. In words, the taboo kernel avoids entering the set $A_\theta$ . When it is clear with respect to which set $ A_\theta $ the taboo kernel is constructed, we simplify the notation by letting

\begin{equation*}{\textbf{T}}_\theta ( B ;\, x ) = [ P_\theta]_{A_\theta} ( B ;\, x ) .\end{equation*}

This implies

\begin{equation*}\int_{\mathbb{R} } h ( y ) {\textbf{T}}_\theta ( d y ;\, x ) = \mathbb{E} [ h ( X_{i+1} ) {\textbf{1}} \{ X_{i+1} \not \in A_\theta \} | X_i = x ] ,\end{equation*}

provided the integral exits, where the transition from $ x = X_i $ to $X_{i+1} $ is governed by $ P_\theta $ . Defining powers of transition kernels in the obvious way by $ Q^n = Q^{n-1} Q $ , with $ Q^0$ denoting the identity, we have

(4)

\begin{equation}\left ( {\textbf{T}}^n_\theta h \right ) (x_0 )= \mathbb{E}\! \left.\left[ \prod_{i=0}^n{\textbf{1}}\{X_i\notin A_{\theta}\} h (X_n ) \right| X_0 = x_0 \right], \quad n \geq 0 ,\end{equation}

for $ x_0 \not \in A_\theta $ and $ {\textbf{T}}^0_\theta $ being the identity operator. For technical convenience we include the indicator $ {\textbf{1}}\{ X_0 \not \in A_\theta \} $ in the indicator product; we may do this without loss of generality, thanks to our assumption that the initial state does not belong to the taboo set $ A_\theta $ .

Since $ x_0 \not \in A_\theta $ , we obtain for the random horizon summation

(5)

\begin{align} \mathbb{E} \left [ \left . \sum_{i=0}^{\tau-1} h(X_i) \right | X_0=x_0 \right ]& = \mathbb{E} \left [ \left . \sum_{i=0}^{\infty } h(X_i) \prod_{j=0}^i {\textbf{1}} \{ X_j \not \in A_\theta \}\right | X_0=x_0 \right ] \nonumber \\ &=\sum_{i = 0}^{\infty} \int_{\mathbb{R}} h(x_i) {\textbf{T}}_\theta^i ( d x_i ;\, x_0 )\nonumber \\ & = \left ( {\textbf{H}}_\theta h \right ) (x_0 ) ,\end{align}

where

\begin{equation*}{\textbf{H}}_\theta =\sum_{n=0}^\infty {\textbf{T}}_\theta^n\end{equation*}

is called the potential of the taboo kernel $ {\textbf{T}}_\theta$ .

The paper is organized as follows. In Section 2 differentiation of the taboo kernel $ {\textbf{T}}_\theta $ is studied. The main result of the paper on the differentiability of the potential $ {\textbf{H}}_\theta $ is provided in Section 3. Section 4 is devoted to translating the overall formulas into gradient estimators. Section 5 addresses applications. We conclude by identifying topics of further research.

2. Differentiation of one-step taboo kernel

2.1. Transition kernels and norms

Let $ ( S , {\mathcal{T}} ) $ be a Polish measurable space, for $ S \subset \mathbb{R}$ . For ease of presentation we restrict our analysis to the one-dimensional case. Let $ {\mathcal{M}} ( S , {\mathcal{T}} ) $ denote the set of finite (signed) measures on $ ( S , {\mathcal{T}}) $ and $ {\mathcal{M}}_1 ( S , {\mathcal{T}}) $ that of probability measures on $ ( S , {{\mathcal{T}} }) $ . Let $ \eta \,:\, \mathbb{R} \rightarrow [ 1 , \infty ) $ be such that $ \inf_x \eta ( x ) =1 $ . The weighted supremum norm with respect to $ \eta $ of a cost function $\psi$ is defined by

\begin{equation*}|| \psi ||_\eta = \sup_{x } \frac{| \psi( x ) |}{\eta ( x ) } ,\end{equation*}

and that of a (signed) measure $ \mu \in {\mathcal{M}} ( S , {\mathcal{T}} ) $ by

\begin{equation*}|| \mu ||_\eta = \int \eta ( x ) | \mu ( dx ) | =\int \eta ( x ) \mu^+ ( d x ) + \int \eta ( x ) \mu^- ( d x ) ,\end{equation*}

with $ \mu^+ , \mu^- $ denoting the Hahn–Jordan decomposition; see [Reference Cohn10]. Typically, $ \mu $ has Lebesgue density $ f_\mu $ and we have

\begin{equation*}|| \mu ||_\eta = \int \eta ( x ) |\, f_\mu ( x) | dx .\end{equation*}

Note that the property $ \eta \geq 1 $ implies

\begin{equation*}| \psi ( x ) | \leq || \psi ||_\eta \eta ( x ) \quad \text{ for all } x ;\end{equation*}

moreover,

\begin{equation*}\left | \int \psi ( x ) f_\mu ( x ) d x \right | \leq\int | \psi ( x ) | \, | f_\mu ( x) | dx \leq \int \left ( \sup_x \frac{ | \psi ( x ) | }{\eta (x )} \right ) \eta ( x ) | f_\mu ( x) | dx= || \psi||_\eta || \mu ||_\eta.\end{equation*}

The mapping $ P \,:\, {{\mathcal{T}} } \times S \rightarrow [0,1] $ is called a (homogeneous) transition kernel on $ ( S , {\mathcal{T}} ) $ if (i) $ P ( \cdot ;\, s ) \in {\mathcal{M}} ( S , {\mathcal{T}} ) $ for all $ s \in S $ , and (ii) $ P ( B ; ) $ is $ {\mathcal{T}} $ measurable for all $ B \in {\mathcal{T}} $ . If, in the condition (i), ${\mathcal{M}} ( S , {\mathcal{T}} ) $ can be replaced by $ {\mathcal{M}}_1 ( S , {\mathcal{T}} ) $ , then P is called a Markov kernel on $ ( S , {\mathcal{T}} ) $ . Denote the set of transition kernels on $ ( S, {\mathcal{T}} ) $ by $ {\mathcal{K}} ( S , {\mathcal{T}} ) $ and the set of Markov kernels on $ ( S , {\mathcal{T}} ) $ by $ {\mathcal{K}}_1 ( S , {\mathcal{T}}) $ . A transition kernel $ P \in {\mathcal{K}} ( S , {\mathcal{T}} ) $ with $ 0 < P (S ;\, s ) < 1 $ for at least one $ s \in S $ is called a defective Markov kernel.

Consider a family of Markov kernels $ (P_\theta \,:\, \theta \in\Theta )$ on $( S, {\mathcal{T}} )$ , with $\Theta\subset \mathbb{R}$ , and let $ {\mathcal{L}}^1 ( P_\theta ;\,\Theta )\subset\mathbb{R}^{S}$ denote the set of measurable mappings $g \,:\, S \rightarrow \mathbb{R} $ such that $ \int_{S} | g ( u )| P_\theta ( du ;\, s ) $ is finite for all $ \theta \in \Theta $ and $ s \in S $ . To simplify the notation, we set

\begin{equation*}( P_\theta g ) ( s ) \triangleq \int_S g ( u ) P_\theta ( d u ;\, s )\end{equation*}

for $ g \in {\mathcal{L}}^1 (P_\theta ;\,\Theta ) $ and $ s \in S $ .

In what follows, we let $ \Theta $ be an open neighbourhood of $ \theta_0 $ and assume that ${{\mathcal{D}}} \subset {\mathcal{L}}^1 ( P_\theta ;\, \Theta )$ .

We call $P_\theta \in {\mathcal{K}} ( S , {\mathcal{T}}) $ differentiable with respect to $ \theta $ for $ {\mathcal{D}} $ , or $ \Theta\hbox{-}{\mathcal{D}} $ -differentiable for short, if $\partial_\theta P_{\theta} \in {\mathcal{K}} ( S , {\mathcal{T}} ) $ exists such that for any $ g \in {\mathcal{D}} $ and any $ s \in S $ ,

(6)

\begin{equation}\frac{d}{d \theta } \int_{S} g ( u ) P_\theta ( du ;\, s ) = \int_{S} g ( u ) \partial_\theta P_\theta ( du ;\, s ).\end{equation}

If the left-hand side of (6) equals zero for all $ g \in {\mathcal{D}} $ , then we say that $ \partial_\theta P_\theta$ is not significant.

In the same vein, we call $P_\theta \in {\mathcal{K}} ( S , {\mathcal{T}}) $ differentiable with respect to x for $ {\mathcal{D}} $ , or X- $ {\mathcal{D}} $ -differentiable for short, if $ \partial_x P_{\theta} \in {\mathcal{K}} ( S , {\mathcal{T}} ) $ exists such that for any $ g \in {\mathcal{D}} $ and any $ s \in S $ ,

(7)

\begin{equation}\frac{d}{d x} \int_{S} \, g ( u ) P_\theta ( du ;\, x ) = \int_{S} g ( u ) \partial_x P_\theta ( du ;\, x ).\end{equation}

If the left-hand side of (7) equals zero for all $ g \in {\mathcal{D}} $ , then we say that $ \partial_x P_\theta$ is not significant.

Assume that $ P_\theta $ has density $ f_\theta ( y ;\, x ) $ , i.e., $ P_\theta ( B ;\, x ) = \int_B f_\theta ( y ;\, x ) dy $ for all $ x \in S $ and $ B \in {\mathcal{T}}$ , and suppose that the partial derivatives of $ f_\theta ( y ;\, x ) $ with respect to x and $ \theta $ , denoted respectively by $ \partial_x f_\theta ( y ;\, x ) $ and $ \partial_\theta f_\theta ( y ;\, x ) $ , both exist. Provided that for all $ x, y \in S $ it holds that $ \partial_y f_\theta ( y ;\, x ) , \partial_x\, f_\theta ( y ;\, x ) , \partial_\theta f_\theta ( y ;\, x ) > 0 $ implies $ f_\theta ( y ;\, x ) > 0 $ , we let

\begin{equation*}{\rm SF}_X ( y ;\, x ) \,:\!=\, {\rm SF}_X^\theta ( y ;\, x ) \,:\!=\, \partial_x \log f_\theta ( y ;\, x ) = \frac{\partial_x f_\theta ( y ;\, x )}{f_\theta ( y ;\, x ) } ,\end{equation*}

\begin{equation*}{\rm SF}_Y ( y ;\, x ) \,:\!=\, {\rm SF}_Y^\theta ( y ;\, x ) \,:\!=\, \partial_y \log f_\theta ( y ;\, x ) = \frac{\partial_y f_\theta ( y ;\, x )}{f_\theta ( y ;\, x ) } ,\end{equation*}

and

\begin{equation*}{\rm SF}_\Theta ( y ;\, x ) \,:\!=\, {\rm SF}_\Theta^\theta ( y ;\, x ) \,:\!=\, \partial_\theta \log f_\theta ( y ;\, x ) = \frac{\partial_\theta f_\theta ( y ;\, x )}{f_\theta ( y ;\, x ) } .\end{equation*}

The mapping $ {\rm SF}_X ( y ;\, x ) $ is called the X-score function acting on the conditioning state (and is explained in more detail in Remark 1 below), the mapping $ {\rm SF}_Y ( y ;\, x ) $ is called the Y-score function acting on the resulting state, and $ {\rm SF}_\Theta ( y ;\, x ) $ is called the $ \Theta$ -score function acting on the system parameter. Under appropriate smoothness conditions,

\begin{equation*}\frac{d}{d x} \int_{S} g ( u ) P_\theta ( du ;\, x ) = \int_{S} g ( u ) \partial_x P_\theta ( du ;\, x ) =\int_S g ( y ){\rm SF}_X ( y ;\, x ) f_\theta ( y ;\, x ) d y ,\end{equation*}

and

\begin{equation*}\frac{d}{d \theta } \int_{S} g ( u ) P_\theta ( du ;\, x ) = \int_{S} g ( u ) \partial_\theta P_\theta ( du ;\, x ) = \int_S g ( y ) {\rm SF}_\Theta ( y ;\, x ) f_\theta ( y ;\, x ) d y .\end{equation*}

We revisit the sojourn time example put forward in Example 1. Denote the transition kernel of the waiting time chain by $ P_\theta $ ; i.e.,

\begin{equation*}P_\theta ( B, x ) = \mathbb{P} ( X_{n+1} ( \theta ) \in B| X_n ( \theta ) = x )\end{equation*}

for $ x \geq 0 $ and $ B \subset ( 0 , \infty ) $ a (Borel) measurable set, or, more formally,

\begin{equation*}P_\theta ( B, x ) = \int_0^\infty \left ( \int_0^{s+x} 1_{ \{ x+s-a \in B \}} f_I ( a ) d a \right ) f_S ( s ) ds ,\end{equation*}

and otherwise, for $ B \in [ 0 , \infty )$ with $ 0 \in B $ ,

\begin{equation*}P_\theta ( B, x ) = \int_0^\infty \left ( \int_0^{s+x} 1_{ \{x+s-a \in B \}} f_I ( a ) d a \right ) f_S ( s ) ds+ \int_0^\infty \left ( \int_{s+x}^\infty f_I ( s ) d s \right ) f_S ( a ) da .\end{equation*}

Inserting $ B = (0 , y ] $ and differentiating with respect to y, we obtain the density of the continuous part of the transition kernel on $ ( 0 ,\infty ) $ as

\begin{equation*}f_\theta ( y ;\, x ) =\int_0^\infty f_I (\!\max \{ x+ s - y , 0 \} ) f_S( s ) d s ,\end{equation*}

$ y > 0 $ , where we assume that $ f_I ( 0 ) = 0 $ , and

\begin{equation*}{\partial_x}\, f_\theta ( y ;\, x ) = \int_0^\infty 1_{ \{ x+s - y > 0 \} } {\partial_z }\, f_I (x + s - y ) f_S ( s ) d s ,\end{equation*}

where $ \delta_z\, f_I ( x+s-y ) $ is shorthand notation for the derivative of $ f_I ( z ) $ with respect to z evaluated at $ z = x + s - y $ . If, for example, interarrival times are exponentially distributed with rate $ \lambda$ , i.e., $ f_I ( x ) = \lambda e^{ - \lambda x }$ , $x \geq 0 $ , then, by $ \delta_x\, f_I ( x ) = - \lambda f_I ( x ) $ ,

\begin{equation*}{\partial_x}\, f_\theta ( y ;\, x ) = - \lambda \int_0^\infty 1_{ \{ x+s - y > 0 \} }\, f_I (x + s - y ) f_S ( s ) d s = - \lambda f_\theta ( y ;\, x ) , \qquad y > 0 ,\end{equation*}

so that in this case

(8)

\begin{equation}\,{\rm SF}_X ( y ;\, x ) = - \lambda .\end{equation}

Following the same line of argument, we have that

(9)

\begin{equation}\,{\rm SF}_Y ( y ;\, x ) = \lambda .\end{equation}

Furthermore, assume that $ S ( \theta ) $ follows a log-normal distribution with location parameter $ \theta $ and scale parameter $ \sigma$ :

\begin{equation*}f_S ( s ) = \frac{1}{s \sigma \sqrt{2 \pi} } \exp\! \left ({-} \frac{ (\!\log\! ( s ) - \theta )^2 }{2 \sigma^2 } \right ) , \quad s > 0 .\end{equation*}

It is well known that the log-normal distribution provides a good fit to the length of calls in a service centre; see, for example, [Reference Brown6]. Then it is easily checked that

\begin{equation*}\partial_\theta f_S ( s ) = \frac{\log\! ( s ) - \theta }{\sigma^2 } f_S (s ),\end{equation*}

and it follows straightforwardly that

\begin{equation*}\partial_\theta f ( y ;\, x ) = \int_0^\infty \frac{\log\! ( s ) - \theta }{\sigma^2 } f_I (\!\max\{x+ s - y , 0 \} ) f_S( s ) d s .\end{equation*}

The above expression cannot be applied in a simulation setting as such. In the following we will therefore rewrite $ \,{\rm SF}_\Theta ( y ;\, x ) $ by conditioning on the interarrival times. More specifically, let z denote the interarrival time and write $ f ( y ;\, x , z ) $ for the density of $ X_{n+1} $ given $X_n=x $ and $ I_{n+1} =z $ ; then

\begin{equation*} f ( y ;\, x , z) = f_S ( y +z -x ) , \quad y > 0 ,\end{equation*}

and

\begin{equation*}\partial_\theta f ( y ;\, x , z) = \frac{\log\! ( y+ z -x ) - \theta }{\sigma^2 } f ( y ;\, x , z )\: \mbox{ and } \:\,{\rm SF}_\Theta ( y ;\, x , z ) = \frac{\log\! ( y+ z -x ) - \theta }{\sigma^2 } .\end{equation*}

When evaluating $ \,{\rm SF}_\Theta $ via simulation, we first observe $ X_n $ , $ X_{n+1} $ , $ I_{n+1} $ , and $ S_n ( \theta ) $ , and we then apply differentiation conditional on $ I_{n+1}$ , which gives

\begin{equation*}\,{\rm SF}_\Theta ( X_{n+1} ;\, X_n , I_{n+1} ) = \frac{\log\! ( S_n ( \theta )) - \theta }{\sigma^2 } , \quad X_{ n+1} > 0 .\end{equation*}

In words, we introduce a sub-Markov chain, where, for given $ X_n $ , first the interarrival time is sampled, giving $ X_n - I_{n+1} $ , and then in a subsequent step the service time is sampled and added.

The $ \eta$ -norm is extended to transition kernels through the operator norm; that is, for $ Q \in {\mathcal{K}} ( S , {\mathcal{T}} ) $ we have

\begin{equation*}||Q||_\eta = \sup_{x \in \mathbb{R}} \frac{1}{\eta (x ) } \int \eta ( y ) | Q( d y , x ) | .\end{equation*}

Note that $ || I ||_\eta = 1 $ for I being the identity operator. By algebra,

(10)

\begin{align}\left | \int g ( y ) Q ( d y ;\, x ) \right | & \leq \int | g ( y ) | \, | Q ( d y ;\, x ) | \nonumber\\ & = \int \frac{| g ( y ) |}{\eta ( y )} \eta ( y ) \, | Q ( d y ;\, x ) | \nonumber\\ & \leq \int \left ( \sup_y \frac{| g ( y ) |}{\eta ( y )} \right ) \eta ( y ) | Q ( d y ;\, x ) | \nonumber\\ & \leq || g ||_\eta \, \eta ( x ) \sup_{x } \frac{1}{ \eta ( x ) } \int \eta ( y ) | Q ( d y ;\, x ) | \nonumber\\ & = || g ||_\eta \, || Q ||_\eta \eta ( x ) ,\end{align}

for all x. In the following we let $ \eta_\alpha ( x ) = \alpha^{ |x |} $ , for $ x \in \mathbb{R}$ , and for $ x \in [ 0 , \infty ) $ it suffices to let $ \eta ( x ) = \alpha^x $ . For a polynomial function $\psi$ , we have $\|\psi\|_{\eta_\alpha}<\infty$ for any $\alpha>1$ . In slight abuse of notation, we will simplify $ || \cdot ||_{\eta_\alpha } $ to $ || \cdot ||_\alpha $ .

Example 4. Let $ A_\theta = [ 0 , \theta ] $ and consider the taboo kernel $ {\textbf{T}}_\theta = [ P_\theta ]_{ [ 0 , \theta ] }$ , with $ P_\theta $ the transition kernel of the waiting times introduced in Example 1. We denote the stability region of the queue by $ \Theta = \{ \theta >0 \,:\, \mathbb{E} [ S ( \theta ) - I ] < 0 \} $ . Then

\begin{align*}|| {\textbf{T}}_\theta ||_{\alpha}& = \sup_{ x \geq 0 } \alpha^{- x}\mathbb{E} \big[ \alpha^{ S ( \theta ) - I + x} 1_{ \{S( \theta )- I +x > \theta \} } \big]\\ & = \sup_{ x \geq 0 }\mathbb{E} \big[ \alpha^{ S ( \theta ) - I} 1_{ S ( \theta )- I +x > \theta } \big]\\ & = \mathbb{E} \big[ \alpha^{ S ( \theta ) - I} \big] .\end{align*}

For $ [ \theta_1 , \theta_2 ] \subset \Theta $ , there exists $ \alpha^\ast $ such that $ \mathbb{E} \big[ \alpha^{ S ( \theta ) - I} \big] $ has negative right-hand-side derivative at $ \alpha = 1 $ and is strictly convex as a mapping in $ \alpha $ on $ [ 1 , \alpha^\ast ) $ , uniformly on $ [ \theta_1 , \theta_2 ]$ . This shows that there exists $ \bar \alpha > 1 $ such that

\begin{equation*}\sup_{\theta \in [ \theta_1 , \theta_2 ] } || {\textbf{T}}_\theta ||_{\bar \alpha} < 1 .\end{equation*}

For details we refer to [Reference Kartashov26, Reference Leahu29].

Note that for given $ S ( \theta ) $ and I, we obtain from the above argument that the expected accumulated cost exists and is finite for all h such that $ || h ||_\alpha < \infty $ , i.e., for all cost functions h that are bounded by $ c \bar \alpha^x $ or some finite constant c. Indeed, letting $ \tau $ denote the first time a waiting time becomes smaller than $ \theta $ , for $ h \geq 0 $ and $ 1 \leq \alpha \leq \bar \alpha $ we obtain, from $ || {\textbf{T}}_\theta||_\alpha < 1 $ together with $ || {\textbf{T}}_\theta^0 ||_\alpha = || {\textbf{T}}_\theta ||_\alpha ^0 = 1 $ and $ || T^i_\theta ||_\alpha \leq || T_\theta ||_\alpha ^i $ for $ i \geq 1 $ , that

\begin{align*}\mathbb{E} \left[ \left . \sum_{i=0}^{ \tau-1} h ( X_i ) \right | X_0 = x \right ]& = \sum_{i=0}^\infty \big( {\textbf{T}}_\theta^i h\big)(x ) \\ & \leq \sum_{i=0}^\infty || {\textbf{T}}_\theta||^i _\alpha || h ||_\alpha \alpha^x\\ & = \alpha^x \frac{|| h ||_\alpha}{1 - || {\textbf{T}}_\theta||_\alpha }\\ & < \infty,\end{align*}

provided that $ || h ||_\alpha $ is finite.

2.2. Differentiability of the taboo kernel

We call a taboo kernel $ {\mathcal{D}} $ -differentiable if a linear operator $ \big ( [P_\theta ]_{{A}_{\theta} }\big )^{\prime} $ exists such that for any $ h \in {\mathcal{D}} $ it holds that

\begin{equation*}\frac{d}{ d \theta}{\textbf{T}}_\theta h =\frac{d}{ d \theta} [P_\theta ]_{{A}_{\theta} } h = \big ( [P_\theta ]_{{A}_{\theta} }\big )^{\prime} h .\end{equation*}

If the above holds, then $ \big( [P_\theta ]_{{A}_{\theta} }\big)^{\prime}$ is the $ \Theta \hbox{-}{{\mathcal{D}}}$ -derivative of $ {\textbf{T}}_\theta $ and we let $ {\textbf{T}}^{\prime}_\theta \,:\!=\, \big( [P_\theta ]_{{A}_{\theta} }\big)^{\prime} $ .

Notice that the derivative of the taboo kernel $ [P_\theta ]_{{A}_{\theta}} $ cannot be straightforwardly obtained in a pathwise sense through differentiating the density as the support of the taboo kernel $A_{\theta}$ is parametrized, which introduces discontinuities. In the same way, the derivative of the taboo kernel $ [P_\theta ]_{{A}_{\theta}}$ cannot be obtained by simply taking the taboo kernel of the measure-valued derivative of $ P_\theta$ , as it is in general not true that $ [ \partial_\theta P_\theta ]_{A_\theta} $ is the derivative of $ [ P_\theta ]_{A_\theta} $ ; in formula, $ [\partial_\theta P_\theta ]_{{A}_{\theta}} \not = \big ( [P_\theta ]_{{A}_{\theta}} \big )^{\prime}$ .

For the analysis provided in this section we require $ A_\theta $ to depend on $ \theta $ in a smooth way. This is expressed in the following.

Assumption 1. (Boundary parametrization.) There exists a boundary mapping $ g_\theta ( x )$ , such that

• for some set $ \Omega $ , independent of $ \theta $ , we have
\begin{equation*}x \in A_\theta^c \quad { if\ and\ only\ if } \quad g_{\theta} (x ) \in \Omega ;\end{equation*}
• the partial derivatives of $ g_\theta$ with respect to $ \theta $ and x exist;
• the set $ \{ x \,:\, \partial_x g_\theta ( x ) = 0 \} $ has Lebesgue measure zero, so that
\begin{align*}\delta_\theta(x) = - \frac{\partial_{\theta} g_\theta(x) }{ \partial_x g_\theta(x)}\end{align*}
is a differentiable and Lebesgue-almost-surely well-defined mapping.

As an illustration of Assumption 1, note that the stopping criterion in (3) is formalized by $ \Omega = [ 0 , 1 ) $ and

\begin{equation*} g_\theta ( x ) = \frac{x}{ \theta} \quad \mbox{ and } \quad \delta_\theta(x) = g_\theta ( x ) . \end{equation*}

In this paper, we show that ${\textbf{T}}^{\prime}_\theta $ can be defined via cost functions $ \psi $ from an appropriate class of functions, to be specified later, so that

\begin{equation*}\frac{\partial}{\partial \theta} ( {\textbf{T}}_\theta \psi) ( x ) =\frac{\partial}{\partial \theta} \int \psi ( y ) {\textbf{T}}_\theta ( dy , x ) =\int \psi ( y ) {\textbf{T}}^{\prime}_\theta ( dy , x )=\big( {\textbf{T}}^{\prime}_\theta \psi\big) ( x ),\end{equation*}

where

(11)

\begin{equation}\begin{aligned}\big( {\textbf{T}}^{\prime}_\theta \psi \big) (x ) = \int_{ A_\theta^c} \big ( \psi(y)\partial_{\theta} f_\theta ( y ;\, x )+ \partial_y\!\left(\psi(y) f_{\theta}(y;\,x) \delta_{\theta}(y)\right) \big ) dy.\end{aligned}\end{equation}

Suppose that $ P_\theta$ is differentiable in an appropriate way, e.g., weakly differentiable [Reference Heidergott, Hordijk and Weisshaupt20, Reference Heidergott and Vázquez-Abad23], with derivative $ P^{\prime}_\theta $ ; then

\begin{equation*}\big[ P^{\prime}_\theta \big]_{A_\theta}\end{equation*}

would be the first part of the estimator representing the derivative of $ P_\theta $ on the complement of the taboo set. In order to get a derivative for $ {\textbf{T}}_\theta $ rather than just $ P_\theta$ , we need an additional term. This compensator collects the state-space derivatives; if we denote this operator by $ {\textbf{D}}$ , Equation (11) reads

\begin{equation*}\big ( {\textbf{T}}_\theta\big )^{\prime} = \big[ P^{\prime}_\theta\big]_{A_\theta} + {\textbf{D}}_\theta ,\end{equation*}

where

\begin{equation*}\Big ( \big[ P^{\prime}_\theta\big]_{A_\theta} \psi \Big ) ( x ) =\int_{ A_\theta^c } \psi ( y ) \partial_{\theta} f_\theta ( y ;\, x ) d y = \int_{A_\theta^c} \psi ( y ) {\rm SF}_\Theta ( y ;\, x) f_\theta ( y ;\, x ) d y ,\end{equation*}

and

\begin{equation*}\big ( {\textbf{D}}_\theta \psi \big ) ( x ) = \int_{ A_\theta^c }\partial_y\!\left(\psi(y) f_{\theta}(y;\,x) \delta_{\theta}(y)\right)\! dy .\end{equation*}

The kernel $ {\textbf{D}}_\theta $ lives on the complement of the taboo set $ A_\theta$ . We provide an explicit form of $ {\textbf{D}}_\theta $ in the following example.

Example 5. Consider the taboo set $ A_\theta = ( \theta , \infty)$ ; then we may let $ \Omega = [ 0 , 1 ]$ with $ g_\theta ( x ) = x / \theta $ , for $ x \geq 0 $ . For $ \psi ( y ) = y^p $ , for some $ p \geq 1$ , we obtain

\begin{equation*}{\textbf{D}}_\theta \psi (x) = \int_0^\theta \frac{y^p}{\theta} \left ( (p +1) + y {\rm SF}_Y ( y ;\, x ) \right ) f_{\theta}( y ;\,x) d y .\end{equation*}

Because of the inner differentiation $ \partial y $ under the integral in $ {\textbf{D}}_\theta $ , the operator $ {\textbf{D}}_\theta $ splits naturally into two parts. Indeed, writing $ {\textbf{D}}_\theta $ in explicit form, we obtain the following for $ {\textbf{T}}^{\prime}_\theta $ :

(12)

\begin{align}\big ( {\textbf{T}}^{\prime}_\theta \psi \big ) ( x ) & = \big(\big[ P^{\prime}_\theta\big]_{A_\theta}\psi\big) (x) + ( {\textbf{D}}_\theta \psi )(x)\nonumber \\ & = \underbrace{\int_{A_\theta^c} \! \! \psi ( y ) {\rm SF}_\Theta ( y ;\, x) f_\theta ( y ;\, x ) d y }_{ =:\, {\textbf{T}}^{\prime}_1} \nonumber \\ & \quad + \underbrace{\int_{A_\theta^c} \psi^{\prime} ( y )\ \delta_{\theta}(y)\ f_\theta ( y ;\, x ) d y }_{ =:\, {\textbf{T}}^{\prime}_2}\nonumber \\ & \quad + \underbrace{\int_{A_\theta^c} \! \! \psi ( y ) \underbrace{\big ( \delta_{\theta}(y) {\rm SF}_Y ( y ;\, x) + \partial_y \delta_{\theta}(y) \big ) }_{ =:\, \nu_\theta ( y ;\, x ) } f_\theta ( y ;\, x ) d y}_{ =:\, {\textbf{T}}^{\prime}_3 } .\end{align}

In words, the derivative of $ {\textbf{T}}_\theta $ splits into three parts: (i) the first part assesses the sensitivity of the Markov chain outside the taboo set $ A_\theta $ ; (ii) the second part measures the knock-on effect of continuing the process on states belonging to the boundary of $ A_\theta$ , where the unperturbed process otherwise would have stopped; and (iii) the third part measures the effect perturbing $ \theta $ has on the performance at the next step reached from the state x, where the perturbation is weighted by the impact of $ \theta $ on the boundary. We explain the meaning of the second part in more detail in the following remark.

Remark 1. Let $ \tilde \psi $ be some cost function and denote the one-step-ahead expected value by

\begin{equation*}\psi ( x ) = \big ( {\textbf{T}}_\theta \tilde \psi \big ) ( x ) .\end{equation*}

Observe that under appropriate smoothness conditions

\begin{align*}\psi^{\prime} (x) & = \partial_x ({\textbf{T}}_\theta \psi \big ) ( x ) = \partial_x \int_{A_\theta^c } \tilde \psi ( y ) f_\theta ( y ;\, x) d y \\ & = \int _{A_\theta^c } \tilde \psi ( y ) \partial_x f_\theta ( y ;\, x) d y = \int _{A_\theta^c } \tilde \psi ( y ) {\rm SF}_X ( y ;\, x) f_\theta ( y ;\, x) d y,\end{align*}

which shows that differentiation of the cost function $ \psi $ can be captured by differentiation of the transition density, provided that $ \psi $ itself can be interpreted as a conditional expectation. This is exactly what happens when products of Markov kernel operators are differentiated. In the following we provide a more general discussion of this phenomenon. Note that, for $ n \geq 1 $ ,

\begin{equation*}\big({\textbf{T}}_\theta^n \psi \big ) ( y) = \mathbb{E} \! \! \left . \left [ \psi ( X_n ) \prod_{i=0}^n{\textbf{1}} \{ X_i \not \in A_\theta \} \right | X_0 =y\right ]\end{equation*}

is the conditional expectation of the reward after n transitions of the taboo chain. Interchanging differentiation and integration leads to

\begin{align*}\partial_y\big ( {\textbf{T}}_\theta^{n} \psi \big ) ( y ) & = \partial_y \big ( {\textbf{T}}_\theta {\textbf{T}}_\theta^{n-1} \psi \big ) ( y ) \\ & = \int_{ A_\theta^c} \big ( {\textbf{T}}_\theta^{n-1} \psi \big ) (z ) \partial_y f_\theta ( z ;\, y ) d z .\end{align*}

Using the X-score function, this derivative can easily be evaluated as follows:

\begin{equation*}\int_{ A_\theta^c} \big ( {\textbf{T}}_\theta^{n-1} \psi \big ) ( y ) \: \partial_y f_\theta ( z ;\, y) d z=\int_{ A_\theta^c} \big ( {\textbf{T}}_\theta^{n-1} \psi \big ) ( z ) {\rm SF}_X ( z, y ) f_\theta ( z ;\, y ) d y .\end{equation*}

In words, the differentiation in the part $ {\textbf{T}}^{\prime}_2 $ of the derivative representation in (12) relates to differentiating a conditional expectation with respect to the conditioning value, which can be dealt with by using the X-score function.

Differentiating a Markov chain with taboo set $ A_\theta $ is related to the differentiation of a distribution with $ \theta$ -dependent support. In the following, we compare the application of our approach to the case of a distribution with $ \theta$ -dependent support.

Example 6. Let $ A_\theta = ( \theta, 1 ] $ , for $0 < \theta < 1 $ , and $ f_\theta ( y ;\, x ) = 1 $ on [ 0, 1 ] and zero otherwise. The taboo kernel can now be written as the defective uniform distribution

\begin{equation*}( {\textbf{T}}_\theta \psi ) (x ) = \int_0^\theta \psi ( y ) d y ,\end{equation*}

which is clearly the expected value with respect to the uniform distribution on $ [ 0 , \theta ] $ scaled by $ \theta $ . This is a classical example that cannot be handled by the score function, as the density $ 1_{ [ 0 , \theta ] } ( y ) $ fails to be Lipschitz continuous; here it is well known that the derivative is given by

(13)

\begin{equation}\frac{\partial}{\partial \theta} ( {\textbf{T}}_\theta \psi ) (x )= \psi ( \theta ) ,\end{equation}

provided that $ \psi $ is continuous.

Let $ g_\theta ( y )= y / \theta $ , $ \Omega = [ 0 , 1 ] $ , and $ f_\theta ( y ;\, x ) = 1_{ [ 0 , 1]} $ . Noticing that $ \partial_\theta f_\theta ( y ;\, x ) = 0 = \partial_y f_\theta ( y ;\, x ) $ and

\begin{equation*}\delta_\theta ( x ) =\frac{x}{\theta} , \qquad \: \partial_x \delta ( x ) = \frac{1}{\theta},\end{equation*}

we obtain from (12) (where we set the score functions to zero)

(14)

\begin{equation}\frac{\partial}{\partial \theta} ( {\textbf{T}}_\theta \psi ) (x )= \int_0^\theta \left ( \partial_y \psi ( y ) \delta_\theta ( y ) + \psi ( y ) \partial_y \delta_\theta ( y ) \right )\! d y .\end{equation}

For example, in the case of $ \psi ( y ) = y^p $ , for some $ p >1 $ , it is easily seen that

\begin{equation*}\frac{\partial}{\partial \theta} ( {\textbf{T}}_\theta \psi ) (x ) = \theta^p,\end{equation*}

which indeed yields the true derivative.

The uniform distribution on $ [ 0 , \theta ] $ can be retrieved by

\begin{equation*}\frac{1}{\theta} ( {\textbf{T}}_\theta \psi ) (x ) =\frac{1}{\theta} \int_0^\theta \psi ( y ) d y .\end{equation*}

Note that, for $ \psi $ continuous, it holds that

\begin{equation*}\frac{\partial}{\partial \theta} \left ( \frac{1}{\theta} \int_0^\theta \psi ( y ) d y \right ) =\frac{\partial}{\partial \theta} \left ( \frac{1}{\theta} ( {\textbf{T}}_\theta \psi ) ( x) \right ) =\frac{1}{\theta } \left ( \psi ( \theta ) - \frac{1}{\theta} \int_0^\theta \psi ( y ) y d y \right )\end{equation*}

for all x, where the right-hand side is an expression for the measure-valued derivative of the uniform distribution. Comparing the above measure-valued derivative with our new one,

\begin{equation*}\frac{\partial}{\partial \theta} \left ( \frac{1}{\theta} ( {\textbf{T}}_\theta \psi ) ( x) \right ) =\frac{1}{\theta } \left ( \int_0^\theta \left ( \partial_y \psi ( y ) \delta_\theta ( y ) + \psi ( y ) \partial_y \delta_\theta ( y ) \right )\! d y - \frac{1}{\theta} \int_0^\theta \psi ( y ) y d y \right ),\end{equation*}

obtained via (14), it is apparent that the resulting estimators are different. However, both estimators have as a common part

\begin{equation*}-\frac{1}{\theta^2} \int_0^\theta \psi ( y ) d y .\end{equation*}

The difference lies in the fact that the (positive part of the) measure-valued derivative concentrates mass on the boundary of the taboo set, whereas our new estimator leads to integration over the complement of the taboo set. Our approach requires $ \psi $ to be differentiable, whereas for measure-valued differentiation only continuity is required.

That our approach requires stronger conditions on the performance function than classical measure-valued differentiation stems from the fact that we develop a differentiation theory for Markov chains (in contrast to the differentiation of unconditional measures), which requires capturing the knock-on effect of a perturbation on the future development of the Markov chain; see Remark 1.

In the following we provide the theoretical development for the estimator in (11). We denote the Euclidean distance between y and $\bar{y}$ by $ \lambda(y,\bar{y}) $ and define the distance from y to a set A by $ \lambda(y,A ) = \inf \{ \lambda(y,\bar{y}) \,:\, \bar y \in A \} $ . Note that $ \lambda(y,A ) = 0$ in the case $ y \in A $ . Recall that $ \Omega $ denotes the base set from which $ A^c_\theta $ is obtained via $ A^c_\theta = \{ x \,:\, g_\theta ( x ) \in \Omega \} $ ; see Assumption 1. We now define $ \epsilon$ sets around the boundary $\partial \Omega=\overline{\Omega}\setminus \Omega^{o}$ through

(15)

\begin{equation}\Omega_{\epsilon}\,:\!=\, \Omega\cup\{ y\in \Omega^c\,:\, \lambda(y,\partial\Omega)\leq \epsilon \}\end{equation}

and

\begin{equation*}\Omega_{-\epsilon}\,:\!=\,\{ y\in \Omega\,:\, \lambda(y,\partial\Omega)\geq \epsilon \} .\end{equation*}

We let

\begin{equation*}\chi_{\epsilon}(y)\,:\!=\, \frac{\lambda(y,\Omega_{-\epsilon})}{\lambda\big(y,\Omega_{\epsilon}^c\big)+\lambda(y,\Omega_{-\epsilon})},\end{equation*}

where, by definition, $\chi_{\epsilon}(y)=0$ for $y\in \Omega_{-\epsilon}$ , $\chi_{\epsilon}(y)=1$ for $y\in \Omega_{\epsilon}^c$ , and $\|\chi_{\epsilon}\|_{\infty}\leq 1$ (where $||\psi||_{\infty}\,:\!=\, \sup_{y\in\mathbb{R}}|\psi(y)|$ for $ \psi\,:\, \mathbb{R} \mapsto \mathbb{R} $ ). It is easy to see that $\chi_{\epsilon}(y)$ converges to ${\textbf{1}}\{y\in\Omega\}$ , except for $ y \in \partial \Omega $ , as $ \epsilon $ tends to zero. Let

\begin{equation*}\bar{\chi}_{\epsilon}(y)\,:\!=\, \int_{y\in \mathbb{R}} \chi_{\epsilon}(z) \phi_{\epsilon}(y-z) dz,\end{equation*}

where $\phi_{\epsilon}$ is the density of a centred normal distribution with variance $\epsilon^2$ , i.e.,

\begin{equation*}\phi_{\epsilon}(y)= \frac{1}{\epsilon\sqrt{2\pi}} \exp\!\left({-}\frac{y^2}{2\epsilon^2}\right).\end{equation*}

In words, $ \bar{\chi}_{\epsilon}(y) $ is a smoothed version of $ {\chi}_{\epsilon}(y) $ using a standard normal density as mollifier. Smoothness of the normal density implies

\begin{align*} \bar{\chi}^{\prime}_{\epsilon}(y) &=\int_{\mathbb{R}} \chi_{\epsilon}(z) \partial_y\phi_{\epsilon}(y-z) dz\\&= \frac{1}{\epsilon \sqrt{2\pi}} \int_{\mathbb{R}} \chi_{\epsilon}(z) \frac{(z-y)}{\epsilon^2}\exp\!\left({-}\frac{(y-z)^2}{2\epsilon^2}\right)\! dz .\end{align*}

The following lemma summarizes properties of $ \bar{\chi}_{\epsilon} $ .

Lemma 1. The functions $\{ \bar{\chi}_{\epsilon} \}$ are smooth and satisfy

\begin{align*}||\bar{\chi}_{\epsilon}||_{\infty}<\infty \quad { and } \quad || \bar{\chi}^{\prime}_{\epsilon} ||_{\infty}<\infty .\end{align*}

Proof. The smoothness of $\bar{\chi}_{\epsilon}(y)$ comes directly from the construction of the convolution, and $||\bar{\chi}_{\epsilon}||_{\infty}\leq || \chi_{\epsilon}||_{\infty}\leq 1$ , which straightforwardly leads to the smoothness and boundedness of $\bar{\chi}_{\epsilon}$ . In addition,

\begin{align*}\left|\left| \bar{\chi}^{\prime}_{\epsilon}\right|\right|_{\infty}&\leq \frac{||\chi_{\epsilon}||_{\infty}}{\epsilon^3\sqrt{2\pi}}\int_{ \mathbb{R}} \ |z-y|\exp\!\left({-}\frac{(y-z)^2}{2\epsilon^2 }\right) dz\\&=\frac{2 ||\chi_{\epsilon}||_{\infty}}{ \epsilon^3 \sqrt{2\pi}}\int_{z=0}^{\infty} \ e^{-z} dz\leq \frac{2}{\epsilon^2 \sqrt{2\pi}},\end{align*}

which leads to the conclusion.

We introduce the following transition kernels:

\begin{align*}&{{\widetilde{\textbf{P}}}_\theta }(B;\, x )\,:\!=\, \int_B \delta_{\theta}(y) f_{\theta}(y;\,x) dy,\quad {{\widehat{\textbf{P}}}_\theta } (B;\, x )\,:\!=\, \int_B \!\left( {\rm SF}_\Theta ( y ;\, x) + \nu_\theta(y;\,x) \right) f_\theta ( y ;\, x ) dy,\\&{\bar{\textbf{P}}}_{\theta}(B;\,x)\,:\!=\, \int_{B} {\rm SF}_X( y ;\,x) f_{\theta}( y;\,x ) dy,\end{align*}

and

\begin{equation*}{\textbf{U}}(B;\,x)\,:\!=\, \int_B \sup_{\theta\in\Theta} f_{\theta}(y;\,x) dy.\end{equation*}

Note that $ {\textbf{U}} $ will be used to establish bounds needed for applying the dominated convergence theorem. The kernels $ {{\widetilde{\textbf{P}}}_\theta } $ , $ {{\widehat{\textbf{P}}}_\theta } $ , and $ {{\bar{\textbf{P}}}_\theta } $ represent different parts of the gradient expression.

Assumption 2. (Boundary integration.) Assume the following:

(i) The sets $\partial A_\theta=\overline{A}_\theta\setminus A_\theta^{o}$ and $ \mathcal{N}_\theta = \{ x\in\mathbb{R}\,:\,\partial_x g_{\theta}(x)=0 \}$ have Lebesgue measure zero. Moreover, we assume that $ \mathcal{N}_\theta $ is measurable for all $ \theta $ .
(ii) It holds for $ \alpha $ that
\begin{equation*}\lim_{y\to \pm \infty}\alpha^{2|y|}f_\theta(y;\,x)=0 .\end{equation*}
(iii) The operator $ {\textbf{U}} $ has bounded norm $ \|{\textbf{U}}\|_{\alpha^2}<\infty $ .
(iv) The Lebesgue measure $\mathscr{L}\,({\cdot})$ of $A^\epsilon_\theta\setminus A_\theta$ converges uniformly to zero, i.e.,
\begin{equation*}\lim_{\epsilon\to0}\sup_{\theta\in\Theta}\mathscr{L}\,\big(A^\epsilon_\theta\setminus A_\theta\big)=0,\end{equation*}
where
\begin{equation*}A^{\epsilon}_{\theta}\,:\!=\, \{ x\in \mathbb{R}\,:\, \: g_\theta(x)\in \Omega_{\epsilon} \} \: .\end{equation*}
(v) We have
\begin{align*}\sup_{\theta \in \Theta} || {\rm SF }_\Theta ||_\alpha &<\infty , \qquad\sup_{\theta \in \Theta} || {\rm SF }_X ||_\alpha<\infty , \\\sup_{\theta \in \Theta} || \nu_\theta ||_\alpha &<\infty , \qquad\sup_{\theta \in \Theta} || \partial_\theta g_\theta ||_\alpha<\infty, \qquad\sup_{\theta \in \Theta} || \delta_\theta ||_\alpha<\infty .\end{align*}

The following results are needed to prove unbiasedness of the derivative estimator. The technical conditions put forward in Assumption 2 guarantee that the mass of the set effected by changing the boundary of $ A_\theta $ is sufficiently smooth; see the conditions (i) and (iv). The remaining conditions establish that all objects needed for the analysis are uniformly (in $ \theta $ ) well-defined in the norm sense. In the case when the process $ X_n $ exhibits some monotonicity in $ \theta $ , these conditions reduce to assuming that the norms are well-defined for the ‘worst’ choice of $ \theta $ .

Lemma 2. Under the conditions (iii)–(v) in Assumption 2, we have

\begin{align*}\sup_{ \theta \in \Theta } \left\| P^{\prime}_\theta\right \|_{\alpha}<\infty, \qquad \sup_{ \theta \in \Theta } \big\| {{\widetilde{\textbf{P}}}_\theta } \big\|_{\alpha}<\infty, \qquad \sup_{ \theta \in \Theta } \big\| {{\widehat{\textbf{P}}}_\theta } \big\|_{\alpha}<\infty, \qquad \sup_{ \theta \in \Theta } \big\| {{\bar{\textbf{P}}}_\theta } \big\|_{\alpha}<\infty,\end{align*}

and

\begin{align*}& \lim_{\epsilon\to 0}\left \| [P_\theta ]_{A^{\epsilon}_\theta } - [P_\theta ]_{A_\theta } \right \|_{\alpha}=0, \lim_{\epsilon\to 0}\sup_{\theta \in \Theta }\left \| \big[ {\widetilde{\textbf{P}}}_\theta \big]_{A^{\epsilon}_\theta } -\big[{\widetilde{\textbf{P}}}_\theta \big]_{A_\theta } \right \|_{\alpha} =0, \\ & \lim_{\epsilon\to 0}\sup_{\theta \in \Theta }\left \| \big[ {\hat{\textbf{P}}}_\theta \big]_{A^{\epsilon}_\theta } -\big[{\hat{\textbf{P}}}_\theta \big]_{A_\theta } \right \|_{\alpha} =0 .\end{align*}

Proof. By definition, (v) implies that for some finite constant c,

\begin{equation*}\sup_{\theta \in \Theta} | {\rm SF }_\Theta ( y ;\, x ) | \leq c \alpha^x \alpha^y \quad \mbox{ for all } x , y.\end{equation*}

This implies

\begin{align*} \sup_{ \theta \in \Theta } \left\| P^{\prime}_\theta\right \|_{\alpha}& \leq \sup_x \frac{ \int \alpha^y \!\left ( \sup_{\theta \in \Theta} | {\rm SF }_\Theta ( y ;\, x ) | f_\theta ( y ;\, x ) \right )\! d y } { \alpha^x } \\ & \leq \sup_x \frac{ \int c \alpha^{2y} \alpha^x \!\left ( \sup_{\theta \in \Theta} f_\theta ( y ;\, x ) \right )\! d y } { \alpha^x } \\ & = \int c \alpha^{2y} \!\left ( \sup_{\theta \in \Theta} f_\theta ( y ;\, x ) \right )\! d y < \infty ,\end{align*}

by (iii). The rest of the results in the first part of the conclusion can be proved similarly. By definition, for some finite constant c,

\begin{equation*}\sup_{\theta \in \Theta} | \delta_\theta ( y ) | \leq c \alpha^y \quad \mbox{ for all } y.\end{equation*}

Then,

\begin{align*}\sup_{\theta \in \Theta }\left \| \big[ {\widetilde{\textbf{P}}}_\theta \big]_{A^{\epsilon}_\theta } -\big[{\widetilde{\textbf{P}}}_\theta \big]_{A_\theta } \right \|_{\alpha}&\leq 2\sup_{ \theta \in \Theta }\int_{A^\epsilon_\theta\setminus A_\theta} \alpha^y |\delta_{\theta}(y)| f_{\theta}(y;\,x) dy\\&\leq 2\sup_{ \theta \in \Theta }\int_{A^\epsilon_\theta\setminus A_\theta} c \alpha^{2y} \!\left ( \sup_{\theta \in \Theta} f_\theta ( y ;\, x ) \right)\! dy.\end{align*}

From the conditions (iii) and (iv),

\begin{align*}\lim_{\epsilon\to0}\sup_{ \theta \in \Theta }\int_{A^\epsilon_\theta\setminus A_\theta} c \alpha^{2y}\! \left ( \sup_{\theta \in \Theta} f_\theta ( y ;\, x ) \right)\! dy=0,\end{align*}

by the absolute continuity of Lebesgue integral [Reference Rudin39]. The rest of the results in the second part of the conclusion can be proved similarly.

Recall the definition of the v-norm $ || \cdot ||_\alpha $ in (23). We denote the set of differentiable mappings with finite $ || \cdot ||_\alpha $ -norm and finite derivative $ || \cdot ||_\alpha $ -norm by

\begin{equation*}\mathcal{D}\,:\!=\, \{ \psi\,:\, \|\psi\|_{\alpha}<\infty \: \mbox{ and } \: \| \partial_x \psi (x) \|_{\alpha}<\infty \} .\end{equation*}

The following theorem establishes our main differentiability result with respect to a single-step transition.

Theorem 1. Under Assumption 1 and Assumption 2, it holds for all $ h \in {\mathcal{D}} $ that

\begin{align*}\frac{d}{d\theta} {\textbf{T}}_\theta h ={\textbf{T}}^{\prime}_\theta h ,\end{align*}

with ${\textbf{T}}^{\prime}_\theta $ as in (11), or, equivalently,

\begin{equation*}{\textbf{T}}^{\prime}_\theta = \big[ {P}^{\prime}_\theta\big]_{A_\theta} + {\textbf{D}}_\theta .\end{equation*}

Proof. First, we have for any h in $ {{\mathcal{D}}} $ that

(16)

\begin{equation}\begin{aligned}& \frac{d}{d\theta} \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ f_{\theta}(y;\,x) d y \\ =\int_{\mathbb{R}}& \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ {\rm SF}_\Theta ( y ;\, x)\ f_{\theta}(y;\,x) dy+\int_{\mathbb{R}}\bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))\ \partial_{\theta} g_{\theta}(y)\ h(y)\ f_{\theta}(y;\,x) d y,\end{aligned}\end{equation}

where the interchange of differentiation and integration in the second equality is justified by the dominated convergence theorem, using that

\begin{align*}\int_{\mathbb{R}}& \sup_{\theta\in\Theta}\Big | \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ {\rm SF}_\Theta ( y ;\, x) \Big | f_{\theta}(y;\,x) dy\leq ||\bar{\chi}_{\epsilon}||_{\infty} \|h\|_{\alpha} \sup_{\theta \in \Theta} || {\rm SF }_\Theta ||_\alpha \|{\textbf{U}}\|_{\alpha^2}<\infty\end{align*}

and

\begin{align*}\int_{\mathbb{R}} \sup_{\theta\in\Theta} \Big | \bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))\ \partial_{\theta} g_{\theta}(y)\ h(y)\ f_{\theta}(y;\,x) \Big | d y\leq ||\bar{\chi}^{\prime}_{\epsilon}||_{\infty}\|h\|_{\alpha} \sup_{\theta \in \Theta} || \partial_\theta g_\theta ||_\alpha \|{\textbf{U}}\|_{\alpha^2}<\infty,\end{align*}

which are justified by Lemma 1 and Assumption 1. By the chain rule, we solve for $ \bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y)) $ from

\begin{equation*}\partial_y\bar{\chi}_{\epsilon}(g_{\theta}(y)) = \bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))\, \partial_y g_{\theta}(y),\end{equation*}

which implies that for every $y\in \mathbb{R}\setminus \mathcal{N}_\theta $ ,

(17)

\begin{equation}\bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))=\left(\partial_y g_{\theta}(y)\right)^{-1} \partial_y\bar{\chi}_{\epsilon}(g_{\theta}(y)).\end{equation}

The value of the function on $\mathcal{N}_\theta $ , which has zero Lebesgue measure by Assumption 2, does not affect the value of the integral. Inserting (17) into (16) yields

(18)

\begin{align}& \int_{\mathbb{R}}\bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))\ \partial_{\theta} g_{\theta}(y)\ h(y)\ f_{\theta}(y;\,x) d y \nonumber \\ & \qquad \qquad =\int_{\mathbb{R}}\, h(y)\, \partial_y\bar{\chi}_{\epsilon}(g_{\theta}(y))\ \partial_{\theta} g_{\theta}(y) \!\left(\partial_y g_{\theta}(y)\right)^{-1} f_{\theta}(y;\,x) d y\nonumber \\ & \qquad \qquad =-\int_{\mathbb{R}} \partial_y\bar{\chi}_{\epsilon}(g_{\theta}(y))\ h (y)\ \delta_\theta( y )\ f_{\theta}(y;\,x) d y .\end{align}

Letting

\begin{equation*}u = \bar{\chi}_{\epsilon}(g_{\theta}(y))\end{equation*}

and

\begin{equation*}v =\, h(y)\, \delta_\theta ( y ) f_{\theta}(y;\,x)\end{equation*}

and applying the formula for integration by parts, $ - \int u^{\prime} v = - u v + \int u v^{\prime}$ , to the right-hand side of (18) yields

\begin{equation*}\begin{aligned}&-\int_{\mathbb{R}}\partial_y\bar{\chi}_{\epsilon}(g_{\theta}(y)) \ h(y)\ \delta_\theta( y )\ f_{\theta}(y;\,x) d y\\ =& - \left.h(y) \bar{\chi}_{\epsilon}(g_{\theta}(y))\ \delta_{\theta}(y)\ f_{\theta}(y;\,x)\right|_{y=-\infty}^{\infty}+\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\, \partial_y\!\left(h(y)\ \delta_{\theta}(y)\ f_{\theta}(y;\,x)\right) d y .\end{aligned}\end{equation*}

By Assumption 1,

\begin{align*}\left|\left.h(y) \bar{\chi}_{\epsilon}(g_{\theta}(y))\ \delta_{\theta}(y)\ f_{\theta}(y;\,x)\right|_{y=-\infty}^{\infty}\right|\leq \|\bar{\chi}_{\epsilon}\|_{\infty}\|h\|_{\alpha} \|\delta_\theta\|_{\alpha}\left. \alpha^{2|y|} f_{\theta}(y;\,x)\right|_{y=-\infty}^{\infty}=0.\end{align*}

To summarize,

\begin{align*}& \frac{d}{d\theta} \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ f_{\theta}(y;\,x) d y \\& =\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ {\rm SF}_\Theta ( y ;\, x)\ f_{\theta}(y;\,x) dy \\& \qquad + \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y)) \left(h^{\prime}(y)\ \delta_{\theta}(y )+h(y)\, \nu_\theta(y;\,x)\right) f_{\theta}(y;\,x) d y,\end{align*}

where the integrability of h ^′ stems from the assumption that $ h^{\prime} \in {\mathcal{D}} $ . To justify the interchange of derivative and $ \epsilon$ -limit, we have

\begin{align*} &\lim_{\epsilon\to0}\sup_{\theta\in\Theta}\left|\int_{\mathbb{R}} \!\left(\bar{\chi}_{\epsilon}(g_{\theta}(y))-{\textbf{1}}\{g_{\theta}(y)\in\Omega\}\right) \gamma_{\theta}(y;\,x) f_\theta(y;\,x) dy\right|\\ \leq & 2 \|h\|_{\alpha} \lim_{\epsilon\to0}\sup_{\theta \in \Theta } \left \| \big[ {\widehat{\textbf{P}}}_\theta \big]_{ A^{\epsilon}_\theta } - \big[ {\widehat{\textbf{P}}}_\theta \big]_{ A_\theta } \right \|_{\alpha} +2\|h^{\prime}\|_{\alpha} \lim_{\epsilon\to0}\sup_{\theta \in \Theta } \left \| \big[{{\widetilde{\textbf{P}}}_\theta } \big] _{ A^{\epsilon}_\theta } - \big[ {{\widetilde{\textbf{P}}}_\theta } \big] _{ A_\theta } \right \|_{\alpha}=0, \end{align*}

where

\begin{equation*}\gamma_{\theta}(y;\,x)\,:\!=\, h^{\prime}(y)\ \delta_{\theta}(y)+h(y) ({\rm SF}_\Theta ( y ;\, x) +\nu_\theta(y;\,x)),\end{equation*}

and

\begin{align*}\lim_{\epsilon\to 0}\left|\int_{\mathbb{R}} \!\left(\bar{\chi}_{\epsilon}(g_{\theta}(y))-{\textbf{1}}\{g_{\theta}(y)\in\Omega\}\right) \, h(y)\, f_\theta(y;\,x) d y\right|\leq 2\|h\|_{\alpha} \lim_{\epsilon\to 0}\left \| [{P_\theta } ]_{ A^{\epsilon}_\theta } - [{P_\theta } ]_{ A_\theta } \right \|_{\alpha}=0.\end{align*}

Summarizing the above analysis, we have

(19)

\begin{align}\frac{d}{d\theta} \int_{\mathbb{R}} {\textbf{1}}\{g_{\theta}(y)\in\Omega\} \ h(y)\ f_{\theta}(y;\,x) d y \nonumber& =\lim_{\epsilon\to 0}\frac{d}{d\theta} \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ h(y)\ f_{\theta}(y;\,x) d y\\ & =\lim_{\epsilon\to 0}\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ \gamma_\theta (y;\,x)\ f_{\theta}(y;\,x) d y \nonumber \\ & =\int_{\mathbb{R}} {\textbf{1}}\{g_{\theta}(y)\in\Omega\} \gamma_\theta\ (y;\,x)\ f_{\theta}(y;\,x) d y,\end{align}

which is equivalent to

\begin{equation*}\frac{d}{d\theta} {\textbf{T}}_{\theta } h = {\textbf{T}}^{\prime}_{\theta } h.\end{equation*}

This proves the theorem.

Remark 2. The theory justifying the interchange of limit, derivative, and integral can be found in [Reference Rudin38, Reference Rudin39]. If h is a constant, the estimator in the theorem goes back to the generalized likelihood ratio method in [Reference Peng, Fu, Hu and Heidergott32].

3. Differentiation of the potential kernel

In this section, we derive the measure-valued derivative of the taboo kernel in an n-step transition, which is used to establish the product rule for the differentiation operation defined by (11).

Theorem 2. Under Assumption 1 and Assumption 2, if $ || P_\theta ||_ \alpha < \infty $ , then it holds for all $ h \in {\mathcal{D}} $ that

\begin{equation*}\frac{d}{d\theta} {\textbf{T}}_\theta^n h = \sum_{k=0}^{n-1}{\textbf{T}}_\theta^{n-k-1} {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k h , \quad n \geq 1 .\end{equation*}

Proof. For $ k \geq 1 $ we have

\begin{equation*}\begin{aligned}& \left.\frac{d}{d\theta}\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y)) \left ( [ P_{\zeta} ]^k_ {A_{\zeta } } \! h \right ) \! (y)\ f_{\theta}(y;\,x) d y\right|_{\zeta=\theta}\\ =&\underbrace{\left. \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y)) \left ( [ P_{\zeta} ]^k_ {A_{\zeta } } \! h \right )\! (y)\ {\rm SF}_\Theta ( y ;\, x)\ f_{\theta}(y;\,x) dy\right|_{\zeta=\theta}}_{ =:\, [a]}\\ &+\underbrace{ \left.\int_{\mathbb{R}}\bar{\chi}^{\prime}_{\epsilon}(g_{\theta}(y))\ \partial_{\theta} g_{\theta}(y) \left ( [ P_{\zeta} ]^k _ {A_{\zeta } } \! h \right )\! (y)\ f_{\theta}(y;\,x) d y\right|_{\zeta=\theta} }_{ =:\, [b]},\end{aligned}\end{equation*}

where the interchange of differentiation and integration is justified similarly as in Theorem 1 by noticing

\begin{equation*} \left | \left ([ P_{\theta}]_ {A_{\theta } } ^k \! h \right )\! (y) \right |\leq \|h\|_{\alpha}\|P_\theta\|_{\alpha}^k \eta_\alpha ( y ) <\infty; \end{equation*}

see (10). Then,

\begin{align*} [b]=& - \underbrace{ \left. \left ([ P_{\theta} ]_ {A_{\theta } } ^k \! h \right )\! (y) \bar{\chi}_{\epsilon}(g_{\theta}(y))\ \delta_{\theta}(y)\ f_{\theta}(y;\,x)\right|_{y=-\infty}^{\infty}}_{=:\, [b_1]}\\ &+ \underbrace{\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\ \partial_y \!\left( \left ( [ P_{\theta}]^k_ {A_{\theta } } \! h \right )\! (y)\ \delta_{\theta}(y)\ f_{\theta}(y;\,x)\right) d y }_{=:\, [b_2]} .\end{align*}

By Assumption 2,

\begin{align*}[b_1]\leq \|\bar{\chi}_{\epsilon}\|_{\infty}\|h\|_{\alpha}\big\|P_\theta\|^k_\alpha \big\|\delta_\theta\|_{\alpha}\left. \alpha^{2|y|} f_{\theta}(y;\,x)\right|_{y=-\infty}^{\infty}=0.\end{align*}

As for the term $b_2$ ,

\begin{align*}[b_2]=& \int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y)) \bigg(\int_{z\in\mathbb{R}}{\textbf{1}}\{g_\theta(z)\in\Omega \} \left( [ P_{\zeta} ]^{k-1}_ {A_{\zeta } } \! h \right) (z) \partial_y f_\zeta(z;\,y) dz \delta_{\theta}(y)\\&\left.+ \left( [ P_{\zeta} ]^k_ {A_{\zeta } } \! h \right) (y) \nu_\theta(y;\,x)\bigg) f_{\theta}(y;\,x) d y\right|_{\zeta=\theta}.\end{align*}

We now bound $ [a]+ [b_2]$ , where we take $ [b_2]$ as on the right-hand side above,

\begin{align*} &\lim_{\epsilon\to0}\sup_{\zeta\in\Theta}\left|\int_{\mathbb{R}} \!\left(\bar{\chi}_{\epsilon}(g_{\zeta}(y))-{\textbf{1}}\{g_{\zeta}(y)\in\Omega\}\right) \tilde{\gamma}_{\zeta,\theta}(y;\,x) f_\zeta(y;\,x) dy\right|\\ \leq \, & 2 \|h\|_{\alpha} \|P_\theta\|_\alpha^{k-1}\big\|{\bar{\textbf{P}}}_\theta\big\|_\alpha \lim_{\epsilon\to0}\sup_{\theta \in \Theta } \left \| [ {{\widehat{\textbf{P}}}_\theta } ]_{ A^{\epsilon}_\theta } - [ {{\widehat{\textbf{P}}}_\theta } ]_{ A_\theta } \right \|_{\alpha} \\ &+ 2\|h^{\prime}\|_{\alpha}\|P_\theta\|_\alpha^k\lim_{\epsilon\to0}\sup_{\theta \in \Theta } \left \| \big[{\widetilde{\textbf{P}}}_\theta \big] _{ A^{\epsilon}_\theta } - \big[{{\widetilde{\textbf{P}}}_\theta } \big]_{ A_\theta } \right \|_{\alpha}=0, \end{align*}

where

\begin{align*}\tilde{\gamma}_{\zeta,\theta}(y;\,x)\,:\!=\, &\int_{z\in\mathbb{R}} {\textbf{1}}\{g_\zeta(z)\in\Omega \}\!\left ( [ P_{\theta}]_ {A_{\theta } } ^{k-1} \! h \right )\! (z)\ {\rm SF}^\zeta_X ( z ;\, y)\ dz \delta_{\zeta}(y)\\&+\left ( [ P_{\theta}] _ {A_{\theta } }^k \! h \right )\! (y)\!\left( {\rm SF}_\Theta^\zeta (y;\,x)+\nu_\zeta(y;\,x)\right),\end{align*}

and

\begin{align*}&\lim_{\epsilon\to 0}\left|\int_{\mathbb{R}} \!\left(\bar{\chi}_{\epsilon}(g_{\theta}(y))-{\textbf{1}}\{g_{\theta}(y)\in\Omega\}\right) \left ([ P_{\zeta}]_ {A_{\zeta } } ^k \! h \right )\! (y)\ f_\theta(y;\,x) d y\right|_{\zeta=\theta}\\ \leq \, & 2\|h\|_{\alpha} \|P_\theta\|_\alpha^k \lim_{\epsilon\to 0}\left \| [ P_\theta ]_{ A^{\epsilon}_\theta } -[{P_\theta } ]_{ A_\theta } \right \|_{\alpha}=0.\end{align*}

Following the line of argument put forward in the proof (see (19)), we arrive at

\begin{align*}&\frac{d}{d\theta}\left.\int_{\mathbb{R}} {\textbf{1}}\{g_{\theta}(y)\in\Omega\} \left ( [ P_{\zeta}]_ {A_{\zeta } }^k \! h \right )\! (y)\ f_{\theta}(y;\,x) d y\right|_{\zeta=\theta} \\ & \qquad=\left.\lim_{\epsilon\to 0}\frac{d}{d\theta}\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y))\left ( [P_{\zeta}]_ {A_{\zeta } } ^k \! h \right ) \! (y)\ f_{\theta}(y;\,x) d y\right|_{\zeta=\theta}\\ & \qquad =\lim_{\epsilon\to 0}\int_{\mathbb{R}} \bar{\chi}_{\epsilon}(g_{\theta}(y)) \tilde{\gamma}_{\theta,\theta} (y;\,x)\ f_{\theta}(y;\,x) d y \\ & \qquad = \int_{\mathbb{R}} {\textbf{1}}\{g_{\theta}(y)\in\Omega\} \tilde{\gamma}_{\theta,\theta} (y;\,x)\ f_{\theta}(y;\,x) d y.\end{align*}

The case of $k=0$ has already been treated in Theorem 1.

By algebra,

\begin{align*} \frac{1}{\Delta} \left ( [ P_{\theta + \Delta}]_ {A_{\theta + \Delta } }^n h- [P_{\theta}]_ {A_{\theta } }^n h\right )& = \sum_{k=0}^{n-1}[ P_{\theta + \Delta} ]_ {A_{\theta + \Delta } }^{n-k-1}\frac{1}{\Delta} \Big ( [P_{\theta + \Delta} ]_ {A_{\theta + \Delta } } - [P_{\theta }]_ {A_{\theta } }\Big ) [ P_{\theta }]_ {A_{\theta } }^k h.\end{align*}

There exists a $\Delta_0>0$ small enough so that for every $|\Delta|<\Delta_0$ , it holds that $\theta+\Delta\in\Theta$ . For $|\Delta|<\Delta_0$ , there exists $\zeta\in\Theta$ such that

\begin{align*}\frac{1}{\Delta} \Big ( [P_{\theta + \Delta}]_ {A_{\theta + \Delta } } - [P_{\theta } ]_ {A_{\theta } }\Big ) [P_{\theta }] _ {A_{\theta } }^k h=\left( \big[{\widetilde{\textbf{P}}}_\zeta \big]_ {A_{\zeta } } \big[ {\bar{\textbf{P}}}_\zeta \big]_ {A_{\zeta } } + \big[ {\widehat{\textbf{P}}}_\zeta\big]_ {A_{\zeta } } [P_{\theta} ]_ {A_{\theta } } \right) [P_{\theta}] _ {A_{\theta } }^{k-1}h\end{align*}

by the mean value theorem. Notice that for $ k \geq 1 $

\begin{align*} &\sup_{\zeta\in\Theta}\left|\left( \big[{\widetilde{\textbf{P}}}_\zeta \big]_ {A_{\zeta } } \big[ {\bar{\textbf{P}}}_\zeta \big]_ {A_{\zeta } } + \big[{\widehat{\textbf{P}}}_\zeta \big]_ {A_{\zeta } } [P_{\theta} ]_ {A_{\theta } } \right) [P_{\theta} ]_ {A_{\theta } }^{k-1}h (y) \right|\\ &\qquad \leq \|h\|_\alpha\|P_\theta\|_\alpha^{k-1} \!\left(\sup_{\theta\in\Theta}\big\|{\widetilde{\textbf{P}}}_\theta\big\|_\alpha\sup_{\theta\in\Theta}\big\|{\bar{\textbf{P}}}_\theta\big\|_\alpha+\|P_\theta\big\|_\alpha\sup_{\theta\in\Theta}\big\|{\widehat{\textbf{P}}}_\theta\|_\alpha\right) \alpha^y <\infty.\end{align*}

By the dominated convergence theorem,

\begin{align*}\lim_{\Delta\to0} \sum_{k=0}^{n-1} [P_{\theta + \Delta} ]^{n-k-1}_ {A_{\theta + \Delta } }\frac{1}{\Delta} \Big ([ P_{\theta + \Delta} ]{_ {A_{\theta + \Delta } } }- [ P_{\theta }] {_ {A_{\theta } } }\Big ) [ P_{\theta }] ^k_ {A_{\theta } } h = \sum_{k=0}^{n-1} [ P_{\theta} ]_ {A_{\theta } }^{n-k-1} [P_\theta]_{{A}^{\prime}_{\theta}} [ P_{\theta }]_ {A_{\theta } }^k h,\end{align*}

which leads to the conclusion.

In order for $( {\textbf{H}_\theta } h) ( x_0 ) $ (the accumulated cost up to the time of hitting the set $ A_\theta $ ) to exist, the process $X_n$ needs to exhibit some kind of drift towards $ A_\theta $ . This is well known from the theory of geometric ergodicity, where the drift towards a recurrent set or state is used to construct a taboo kernel. We refer to [Reference Kartashov26] for details and to Example 4 for an illustration of this principle. In words, $ || {\textbf{T}}_\theta ||_\alpha < 1 $ means that the set $ A_\theta $ carries enough mass of $ P_\theta $ so that removing $ A_\theta $ from the state-space reduces the norm of $ P_\theta $ to less than one.

Theorem 3. Under Assumption 1 and Assumption 2, if

\begin{equation*} \sup_{ \theta \in \Theta } || {\textbf{T}}_\theta ||_\alpha < 1 , \end{equation*}

then it holds for all $ h \in {\mathcal{D}} $ that

\begin{align*}\frac{d}{d\theta} \sum_{n=0}^{\infty} {\textbf{T}}_\theta^{n} h = \sum_{n=0}^{\infty} \sum_{k=0}^{\infty } {\textbf{T}}_\theta^{n} {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k h ,\end{align*}

or, equivalently, $ {\textbf{H}}_\theta $ is $ {\mathcal{D}}$ -differentiable with derivative

\begin{equation*}{\textbf{H}^{\prime}_\theta } = {\textbf{H}_\theta } {\textbf{T}}^{\prime}_\theta {\textbf{H}_\theta} .\end{equation*}

Proof. By Assumption 2 we have

\begin{equation*} \sup_{ \theta \in \Theta } || {\textbf{T}^{\prime}_\theta}||_\alpha =:\, c < \infty . \end{equation*}

Letting

\begin{equation*} \sup_{ \theta \in \Theta } || {\textbf{T}}_\theta ||_\alpha = \rho , \end{equation*}

it follows from (10) and the assumption that $ \rho <1 $ that

(20)

\begin{align}\sum_{n=1}^\infty \sup_{\theta\in\Theta}\left|\sum_{k=0}^{n-1} {\textbf{T}}_\theta ^{n-k-1} {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k h (x) \right| < \sum_{n=1}^\infty \rho^{n-1} c \alpha^x = \frac{c \rho}{1- \rho} \alpha^x . \end{align}

The proof of the proposition above is a simple application of Theorem 2 and the dominated convergence theorem.

In the case $ A_\theta = A $ , Theorem 3 recovers the known results for measure-valued differentiation and the score function; see [Reference Heidergott and Vázquez-Abad22] and the discussion therein.

4. Estimation procedures

In this section we translate the gradient expression put forward in Theorem 3 into simulation estimators. Let

(21)

\begin{equation}g ( x ) = \left ( \sum_{k=0}^\infty {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi \right ) (x ),\end{equation}

where x denotes the initial value of the individual transition operators $ {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k$ .

Suppose that we evaluate $ \big( {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi\big) ( X_i ) $ at $ X_i \not \in A_\theta$ . This kernel represents the event that we start in $ X_i $ and evaluate the expected value of $ \psi$ along a path that does not enter the taboo set, where the first transition of the path is governed by the derivative kernel $ {\textbf{T}}^{\prime}_\theta $ . Elaborating on (12), we can replace $ {\textbf{T}}^{\prime}_\theta $ by $ {\textbf{T}}_\theta $ with appropriate scaling via the score functions and the derivatives of $ \delta_\theta $ and $ \psi $ . This leads to the following estimator for $ \big( {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi\big) ( X_i ) $ at $ X_i \not \in A_\theta$ :

\begin{align*} \left ( {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi \right ) ( X_i) = &\mathbb{E} \left [ \left .\phi_\psi ( X_i ;\, X_{i+1} , \ldots , X_{i+k+1} ) \right | X_i \right ] , \quad k \geq 1,\end{align*}

where

\begin{align*}& \phi_\psi ( X_i ;\, X_{i+1} , \ldots , X_{i+k+1} ) =\\ & \prod_{j=i}^{i+k+1} {\textbf{1}}\{X_j \notin A_{\theta}\} \left ( \underbrace{\psi ( X_{i+1+k} ) {\rm SF}_\Theta ( X_{i+1} ;\, X_i )}_{ {\textbf{T}}^{\prime}_1} +\underbrace{ \psi ( X_{i+1+k}) {\rm SF}_X ( X_{i+2} ;\, X_{i+1}) \delta_{\theta}(X_{i+1}) }_{ {\textbf{T}}^{\prime}_2} \right .\\ & \qquad\qquad\qquad\qquad + \left .\underbrace{ \psi ( X_{i+1+k}) \left ( {\rm SF}_Y ( X_{i+1} ;\, X_{i}) \delta_{\theta}(X_{i+1}) + \partial_y \delta_{\theta}( X_{i+1})\right ) }_{{\textbf{T}}^{\prime}_3} \right ) ,\quad k \geq 1,\end{align*}

and, for $k=0$ , we can estimate $ {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^0 \psi ( X_i ) = {\textbf{T}}^{\prime}_\theta \psi ( X_i ) = \mathbb{E} [ \phi_\psi ( X_i ;\, X_{i+1} ) | X_i ]$ , for $ X_i \not \in A_\theta $ , via

\begin{align*}\phi_\psi ( X_i ;\, X_{i+1} ) = {\textbf{1}}\{ X_i , X_{i+1} \notin A_{\theta}\} & \left (\underbrace{\psi ( X_{i+1} ) {\rm SF}_\Theta ( X_{i+1} ;\, X_i )}_{ {\textbf{T}}^{\prime}_1} +\underbrace{ \psi^{\prime} ( X_{i+1}) \delta_{\theta}(X_{i+1}) }_{ {\textbf{T}}^{\prime}_2} \right . \nonumber\\ & \quad + \left .\underbrace{ \psi ( X_{i+1}) \left ( {\rm SF}_Y ( X_{i+1} ;\, X_{i}) \delta_{\theta}(X_{i+1}) + \partial_y \delta_{\theta}( X_{i+1})\right ) }_{{\textbf{T}}^{\prime}_3} \right ) .\end{align*}

Consider a cycle $ \{ X_0 , X_1 , \ldots , X_{\tau ( x ) -1 }\} $ of the Markov chain, with $ X_0 = x \not \in A_\theta $ , where we have expanded the notation for the first entrance time into $ A_\theta $ , denoted so far by $ \tau $ , by explicitly indicating the initial state. Starting the count from 0, the cycle length is $ \tau ( x ) $ . As the transitions from $ X_0 $ to $ X_1 $ up to the transition from $ X_{\tau_{(x ) }-2 } $ to $ X_{ {\tau ( x )} -1 } $ do not enter the taboo set, they may be considered as having been sampled via $ {\textbf{T}}_\theta $ , whereas the transition from $ X_{ \tau ( x ) -1 } $ to $ X_{ \tau (x)} $ cannot be considered as having been driven by $ {\textbf{T}}_\theta $ , since with this transition the Markov chain enters the taboo set. This shows that there are $ \tau ( x ) -1 $ taboo kernel transitions in a cycle of length $ \tau ( x ) $ , and the indices of the states from which a taboo kernel transition is executed run from 0 to $ \tau ( x ) - 2 $ . Following the line of argument put forward in (5),

\begin{align*} g ( x ) = \sum_{k=0}^\infty \left ( {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi \right ) (x )& = \sum_{k=0}^\infty \mathbb{E} \left.\left[ \prod_{i=0}^{k+1} {\textbf{1}}\{X_i \notin A_{\theta}\} \phi_\psi ( X_0 ;\, X_{1} , \ldots , X_{k+1} ) \right| X_0 = x \right] \nonumber \\[3pt] & = \mathbb{E} \left [ \left . \sum_{i=1}^{\tau ( x ) -1} \phi_\psi ( X_0 ;\, X_{1} , \ldots , X_{i} ) \right | X_0=x \right ]\end{align*}

for $ x \not \in A_\theta $ , where we set the sum to zero in the case where the lower bound is larger than the upper bound (which happens for realizations where $ \tau ( x ) = 1 $ ).

Treating the outer sum in the same way, we obtain from Theorem 3, with $ x_0 \not \in A_\theta $ ,

(22)

\begin{align}\frac{d}{d \theta } \sum_{n=0}^{\infty} \left ( {\textbf{T}}_\theta^{n} \psi \right ) (x_0 )& = \sum_{n=0}^{\infty}\sum_{k=0}^\infty \left ( {\textbf{T}}_\theta^{n} {\textbf{T}}^{\prime}_\theta {\textbf{T}}_\theta^k \psi \right ) (x_0 )\\ & = \sum_{n=0}^\infty \left ( {\textbf{T}}^n_\theta g \right ) (x_0 ) \nonumber\\ & = \sum_{n=0}^\infty \mathbb{E} \left.\left[ \prod_{i=0}^n{\textbf{1}}\{X_i\notin A_{\theta}\} g (X_n ) \right| X_0 = x_0 \right] \nonumber\\ & = \mathbb{E} \left [ \left . \sum_{i=0}^{\tau ( x_0 ) -2} \sum_{k=i+1}^{\tau ( x_0)-1} \phi_\psi ( X_i ;\, X_{i+1} , \ldots , X_{k }) \right | X_0=x_0 \right ]. \nonumber \end{align}

Simulation offers the opportunity to first sample the time horizon $ \tau ( x ) $ . Then, re-initiating the random number stream, we can exactly repeat the stochastic experiment. Given $ \tau ( x )$ , we can sample $ \sigma $ independently from everything else uniformly distributed on $ \{ 0 , 1 , \ldots ,$ $ \tau( x)-2 \} $ . With this definition we can apply the randomization to the outer $ \tau (x_0)$ sum in (22) and obtain the following alternative version of the estimator:

\begin{eqnarray*}& &\left ( \frac{d}{d\theta} \sum_{n=0}^{\infty} {\textbf{T}}_\theta^{n} \psi \right ) ( x_0)=\mathbb{E} \left [ \left . (\tau ( x_0 ) -1) \sum_{k=\sigma+1}^{\tau ( x_0) - 1} \phi_\psi ( X_\sigma ;\, X_{\sigma +1} , \ldots , X_{k } ) \right | X_0=x_0 \right ] .\end{eqnarray*}

Notice that the above estimator contains several likelihood ratio terms, namely, ${\rm SF}_\Theta ( y ;\, x ) $ , $ {\rm SF}_X ( y ;\, x ) $ as a term of $\nu_\theta$ , and ${\rm SF}_Y ( y ;\, x ) $ , which may lead to a large variance of the derivative estimator. Following the standard approach in measure-valued differentiation [Reference Heidergott, Hordijk and Weisshaupt20, Reference Heidergott18, Reference Heidergott and Vázquez-Abad23, Reference Pflug33, Reference Pflug34], we may apply the Hahn–Jordan decomposition,

\begin{align*}\partial_\theta f_{\theta} (y;\,x)=c_\theta(x)\!\left( f^{+}_\theta(y;\,x)-f^{-}_\theta(y;\,x)\right),\end{align*}

where

\begin{align*}&f^{+}_{\theta} (y;\,x)\,:\!=\, \left(\partial_\theta f_{\theta} (y;\,x)\right)^{+}/c_\theta(x),\quad f^{-}_{\theta} (y;\,x)\,:\!=\, \left(\partial_\theta f_{\theta} (y;\,x)\right)^{-}/c_\theta(x),\\& c_\theta(x)\,:\!=\, \int_{\mathbb{R}} \!\left(\partial_\theta f_{\theta} (y;\,x)\right)^{+} d y=\int_{\mathbb{R}} \!\left(\partial_\theta f_{\theta} (y;\,x)\right)^{-} d y,\end{align*}

by noticing that

\begin{equation*}\int_{\mathbb{R}} \!\left(\partial_{\theta} f_{\theta} (y;\,x)\right)^{+} dy-\int_{\mathbb{R}} \!\left(\partial_{\theta} f_{\theta} (y;\,x)\right)^{-} d y=\partial_{\theta} \int_{\mathbb{R}}f_{\theta} (y;\,x) dy=0 . \end{equation*}

The Hahn–Jordan decomposition is often not easy to sample from; typically one can find decompositions that are easier to sample from (see [Reference Heidergott, Hordijk and Weisshaupt20]).

We turn to the state derivative, i.e., the Y -score function $ {\rm SF}_Y ( y ;\, x ) $ . We obtain that

\begin{align*}\partial_y f_{\theta} (y;\,x)=\hat{c}_\theta(x) \big(\hat{f}^{+}(y;\,x)- \hat{f}^{-}(y;\,x)\big),\end{align*}

where

\begin{align*}&\hat{f}^{+}_{\theta} (y;\,x)\,:\!=\, \left(\partial_y f_{\theta} (y;\,x)\right)^{+}/\hat{c}_\theta(x),\quad \hat{f}^{-}_{\theta} (y;\,x)\,:\!=\, \left(\partial_y f_{\theta} (y;\,x)\right)^{-}/\hat{c}_\theta(x),\\& \hat{c}_\theta(x)\,:\!=\,\int_{\mathbb{R}} \!\left(\partial_y f_{\theta} (y;\,x)\right)^{+} d y=\int_{\mathbb{R}} \!\left(\partial_y f_{\theta} (y;\,x)\right)^{-} d y,\end{align*}

by noticing

\begin{equation*}\int_{\mathbb{R}} \!\left(\partial_y f_{\theta} (y;\,x)\right)^{+} dy-\int_{\mathbb{R}} \!\left(\partial_y f_{\theta} (y;\,x)\right)^{-} d y= f_{\theta}(y;\,x)|_{y=-\infty}^{\infty}= 0.\end{equation*}

The estimator also requires differentiation of the conditional expectation of the future costs with respect to the initial value captured by $ {\rm SF}_X ( y ;\, x ) $ . For this we construct the decomposition

\begin{align*}\partial_x f_{\theta} (y;\,x)=\bar{c}_\theta(x)\left( \bar{f}^{+}_\theta(y;\,x)-\bar{f}^{-}_\theta(y;\,x)\right),\end{align*}

where

\begin{align*}&\bar{f}^{+}_{\theta} (y;\,x)\,:\!=\, \left(\partial_x f_{\theta} (y;\,x)\right)^{+}/\bar{c}_\theta(x),\quad \bar{f}^{-}_{\theta} (y;\,x)\,:\!=\, \left(\partial_x f_{\theta} (y;\,x)\right)^{-}/\bar{c}_\theta(x),\\& \bar{c}_\theta(x)\,:\!=\, \int_{\mathbb{R}} \!\left(\partial_x f_{\theta} (y;\,x)\right)^{+} d y=\int_{\mathbb{R}} \!\left(\partial_x f_{\theta} (y;\,x)\right)^{-} d y,\end{align*}

by noticing that

\begin{equation*}\int_{\mathbb{R}} \!\left(\partial_{x} f_{\theta} (y;\,x)\right)^{+} dy-\int_{\mathbb{R}} \!\left(\partial_{x} f_{\theta} (y;\,x)\right)^{-} d y=\partial_{x} \int_{\mathbb{R}}f_{\theta} (y;\,x) dy=0.\end{equation*}

Elaborating on the above decompositions, the score functions in the estimator can be replaced by appropriate splittings of the sample paths, where the positive and negative parts are given by the Markov kernels corresponding to the above densities. We refer to the literature on measure-valued differentiation for details.

Following the framework provided in [Reference Peng, Fu, Hu and Heidergott32], the results presented in this paper can be extended to the multidimensional case, where $ S \subset \mathbb{R}^d $ , for $ d > 1 $ . Specifically, the weight supremum norm is extended to $ \mathbb{R}^d $ by letting

(23)

\begin{equation}\eta_\alpha ( x_1 , \ldots , x_d ) = \alpha^{| x_1 | } \cdots \alpha^{| x_d| } , \quad \alpha>1 .\end{equation}

5. Applications

Consider the following simple capital flow model for an unlimited liability company. Suppose at the nth period a company receives a random profit of $W_n$ . If the company has a positive capital at the nth period, then (i) $rX_n$ is given to the shareholder as a dividend, and (ii) the remaining capital is topped up by $ M > 0 $ , and the excess of capital $\max\!( r X_n - M , 0 ) $ is extracted; otherwise, the company receives $-r X_n$ capital from the shareholders, for some $ r \in ( 0 ,1 ) $ . The company has credit to borrow money up to $\theta\geq 0$ , and if the company’s capital is below $-\theta$ , then it is ruined and no longer has any debt obligations. Then, with $ X_0 = x $ ,

\begin{equation*}\tau \,:\!=\, \tau ( x ) = \inf \{ n \geq 1 \,:\, X_n+\theta \leq 0 \} ,\end{equation*}

and

\begin{equation*}X_n =(1-r)X_{n-1}+W_n,\end{equation*}

for $ X_n \leq M $ and M otherwise. The cash flow of the company is

\begin{equation*}\sum_{n=0}^{\tau-1}r X_n,\end{equation*}

where $ X_0 = x_0 $ gives the initial value. Let $W_n$ , $n\in\mathbb{N}$ , be a sequence of i.i.d. random variables that follow a normal distribution with mean $\mu=0.1$ and variance $\sigma^2=1$ , and set $x_0=0.3$ , $r=0.3$ , and $\theta=0$ . Let $ P_\theta $ denote the transition kernel of the Markov chain $ X_n $ . Then $ (P_\theta) ( dy , x ) $ has a transition density

\begin{equation*}f(y;\,x)=\frac{1}{\sigma \sqrt{2\pi}}\exp\!\left({-}\frac{\left(y-\mu-(1-r)x\right)^2}{2\sigma^2}\right) \end{equation*}

on $ ({-} \infty , M ] $ and a point mass

\begin{equation*}(P_\theta) ( M , x ) = \int_M^\infty f ( y ;\, x ) d y .\end{equation*}

We have $\Omega=({-} \infty , 0 ] $ , $g(x;\,\theta)=x+\theta$ , $ A_\theta = ({-} \infty , - \theta ] $ , and ${\textbf{T}}_\theta = [ P_\theta ]_{ ({-} \infty , - \theta ] }$ .

We are interested in estimating the sensitivity of the total cash flow until ruin,

\begin{equation*}h ( \theta ,x_0 ) = \left ( {\textbf{H}}_\theta \psi \right )\! ( x ) = \sum_{n=0}^\infty \left ( {\textbf{T}}_\theta^n \psi \right )\! ( x )= \mathbb{E}\left [ \left . \sum_{n=0}^{\tau-1}r X_n \right | X_0 = x_0 \right ] . \end{equation*}

We have

\begin{equation*}\| {\textbf{T}} \|_\alpha = \sup_{ x \leq M } \alpha^{-|x|} \left ( \int_{-\theta}^M \alpha^{|y|} f ( y ;\, x ) d y + \alpha^{M}\int_M^\infty f ( y ;\, x ) d y \right ),\end{equation*}

and letting $ \alpha = 1 $ (which we may do as $ |X_n | \leq M $ is bounded for $ | \theta | < M$ and thus $ \psi ( X_n ) $ also is) gives

\begin{equation*}\| {\textbf{T}} \|_1 = \sup_{ x \leq M } \int_{-\theta}^\infty f ( y ;\, x ) d y =\mathbb{P} \bigg ( \sigma Z + \mu + ( 1 - r ) x > - \theta \bigg ) ,\end{equation*}

for Z a standard normal random variable. It follows that

\begin{equation*} \| {\textbf{T}} \|_1 =\mathbb{P} \big ( \sigma Z + \mu + ( 1 - r ) M > - \theta \big ) < 1\end{equation*}

for any $ \theta \leq 0 $ . Obviously, Assumptions 1 and 2 are satisfied, and by Theorem 3 we obtain

\begin{equation*}\partial_\theta h ( \theta ,x_0 ) = \big ( {\textbf{H}}_\theta {\textbf{T}}^{\prime}_\theta {\textbf{H}}_\theta \psi \big ) ( x_0 ) .\end{equation*}

For ease of analysis, we let $ M = \infty $ in the following. This is justified as $ r X_n > M$ is a rare event for the numerical setting of the example, which can be easily justified numerically. Note that $\delta_\theta(x)=-1$ , and

\begin{equation*}{\rm SF}_X ( y ;\, x )= \frac{1-r}{ \sigma^2} \bigg ( y-\mu-(1-r)x \bigg ) ,\quad {\rm SF}_Y ( y ;\, x )= \frac{1}{ \sigma^2} \bigg ( (1-r)x+\mu-y \bigg ) .\end{equation*}

After some algebra, we obtain for the unbiased estimator

\begin{align*} & {-}r(\tau-1)+ \frac{r}{\sigma^2}\left(\sum_{i=1}^{\tau-1}X_i\right)\\& \quad \times \left\{\sum_{i=1}^{\tau-1}\left[(1+(1-r)^2)X_i-(1-r)(X_{i-1}+X_{i+1})-r\mu\right]+X_\tau-(1-r)X_{\tau-1}-\mu\right\}.\end{align*}

We plot the total cash flow and the expected time to ruin in Figure 1. We compare the performance of our estimator to that of a finite-difference estimator (FD) defined by

\begin{align*}\frac{1}{\delta} \left ( \sum_{n=1}^{\tau_{\theta+\delta}-1}r X_n-\sum_{n=1}^{\tau_{\theta}-1}r X_n \right ) ;\end{align*}

see [Reference L’Ecuyer and Perron30]. We denote the FD with perturbation size $0.1$ by FD $(0.1)$ , and the FD with perturbation size $0.01$ by FD $(0.01)$ . The means and standard errors of the three derivative estimators are reported in Table 1, where we show results for $ r =0.3 $ and additionally for $ r =0.9 $ , denoting by n the number of independent experiments used for the estimators.

Table 1. Derivative estimates for the expected cash flow in the ruin problem (mean $\pm$ standard error).

Figure 1. Expected cash flow (left) and time to ruin (right) as a function of $ \theta $ for $ r=0.3$ .

In Table 1, we can see that the estimator in this paper leads to the lowest variance, while FD suffers from the bias–variance dilemma. In addition, the computation time of our estimator is about half that of FD, because the former only needs to simulate one sample path, while the latter needs to simulate two sample paths. The key to applying the common random numbers (CRN) technique is to synchronize the simulations of the two sample paths [Reference Law and Kelton28]. For the random horizon problem, it is not clear how to efficiently synchronize simulation of the sample paths under the perturbed parameter. Thus, it may not be easy to implement CRN to reduce the variance of FD. In this example, there is no need to calculate the Hahn–Jordan decomposition, but we note that one could split the random variables as follows: $\hat{c}=-1/\sqrt{2}$ , $\bar{c}=(1-r)/\sqrt{2\pi}$ , and

\begin{equation*}\widehat{X}_{n}^{+}, \bar{X}_{n}^{+}\sim \mu+(1-r)X_{n-1}+Wei\big(2,\sqrt{2}\sigma\big),\quad \widehat{X}_{n}^{-}, \bar{X}_{n}^{-}\sim \mu+(1-r)X_{n-1}-Wei\big(2,\sqrt{2}\sigma\big),\end{equation*}

where Wei(a, b) denotes a Weibull-distributed random variable with the density

\begin{equation*}\frac{a}{b}\!\left(\frac{x}{b}\right)^{a-1}e^{-(x/b)^a}{\textbf{1}}\{ x\geq 0 \}.\end{equation*}

6. Conclusion

In this paper, we have developed an approach for differentiating random horizon problems, where the parameter of interest may affect the stopping criterion as well as the transition dynamics of the underlying stochastic process. We discuss the multidimensional case and various implementations of the obtained derivative expressions.

Acknowledgements

The authors express their gratitude to Professor Michael Fu for fruitful discussions of an earlier draft of the paper, and to anonymous reviewers and the associate editor for their constructive comments.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process for this article.

References

Asmussen, S. et al. (2008). Asymptotic behavior of total times for jobs that must start over if a failure occurs. Math. Operat. Res. 33, 932–944.CrossRef Google Scholar

Asmussen, S., Lipsky, L. and Thompson, S. (2014). Checkpointing in failure recovery in computing and data transmission. In Analytical and Stochastic Modelling Techniques and Applications (ASMTA 2014), eds B. Sericola, M. Telek and G. Horváth, Springer, Cham, pp. 253–272.CrossRef Google Scholar

Avrachenkov, K., Piunovskiy, A. and Zhang, Y. (2015). Hitting times in Markov chains with restart and their application to network centrality. Methodology Comput. Appl. Prob. 20, 1173–1188.CrossRef Google Scholar

Bashyam, S. and Fu, M. (1994). Application of perturbation analysis to a class of periodic review (s, S) inventory systems. Naval Res. Logistics 41, 47–80.Google Scholar

Bashyam, S. and Fu, M. (1998). Optimization of (s, S) inventory systems with random lead times and a service level constraint. Manag. Sci. 44, 243–256.Google Scholar

Brown, L. et al. (2005). Statistical analysis of a telephone call center. J. Amer. Statist. Assoc. 100, 36–50.CrossRef Google Scholar

Cao, X. (2007). Stochastic Learning and Optimization: a Sensitivity-Based Approach. Springer, New York.CrossRef Google Scholar

Caswell, H. (2013). Sensitivity analysis of discrete Markov chains via matrix calculus. Linear Algebra Appl. 438, 1727–1745.CrossRef Google Scholar

Caswell, H. (2019). Sensitivity Analysis: Matrix Methods in Demography and Ecology. Springer, Cham.CrossRef Google Scholar

Cohn, D. (1980). Measure Theory. Birkhäuser, Stuttgart.CrossRef Google Scholar

Dekker, R. et al. (1998). Maintenance of light-standards—a case-study. J. Operat. Res. Soc. 49, 132–143.CrossRef Google Scholar

Fu, M. and Hu, J. Q. (1997). Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Kluwer, Boston.CrossRef Google Scholar

Fu, M. C. (2006). Gradient estimation. In Handbooks in Operations Research and Management Science, Vol. 13, Simulation, eds S. Henderson and B. Nelson, North Holland, Amsterdam, pp. 575–616.CrossRef Google Scholar

Glasserman, P. (1991). Gradient Estimation via Perturbation Analysis. Kluwer, Boston.Google Scholar

Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering. Springer, New York.Google Scholar

Heidergott, B. (2001). A weak derivative approach to optimization of threshold parameters in a multi-component maintenance system. J. Appl. Prob. 38, 386–406.CrossRef Google Scholar

Heidergott, B. (2001). Option pricing via Monte Carlo simulation: a weak derivative approach. Prob. Eng. Inf. Sci. 15, 335–349.CrossRef Google Scholar

Heidergott, B. (2007). Max-Plus Linear Stochastic Models and Perturbation Analysis. Springer, New York.Google Scholar

Heidergott, B. and Farenhorst-Yuan, T. (2010). Gradient estimation for multicomponent maintenance systems with age-replacement policy. Operat. Res. 58, 706–718.CrossRef Google Scholar

Heidergott, B., Hordijk, A. and Weisshaupt, H. (2008). Derivatives of Markov kernels and their Jordan decomposition. J. Appl. Anal. 14, 13–26.CrossRef Google Scholar

Heidergott, B., Leahu, H. and Volk-Makarewicz, W. (2014) A smoothed perturbation analysis of Parisian options. IEEE Trans. Automatic Control 60, 469–474.CrossRef Google Scholar

Heidergott, B. and Vázquez-Abad, F. (2006). Measure-valued differentiation for random horizon problems. Markov Process. Relat. Fields 12, 509–536.Google Scholar

Heidergott, B. and Vázquez-Abad, F. (2008). Measure-valued differentiation for Markov chains. J. Optimization Theory Appl. 136, 187–209.CrossRef Google Scholar

Ho, Y. C. and Cao, X. (1991). Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer, Boston.CrossRef Google Scholar

Kallenberg, O. (2001). Foundations of Modern Probability, 2nd edn. Springer, New York.Google Scholar

Kartashov, N. (1996). Strong Stable Markov Chains. De Gruyter, Zeist.CrossRef Google Scholar

Kulkarni, G., Nicola, V. and Trivedi, S. (1987). The completion time of a job on multimode systems. Adv. Appl. Prob. 19, 932–954.CrossRef Google Scholar

Law, A. and Kelton, D. (2000). Simulation Modeling and Analysis. McGraw-Hill, Boston.Google Scholar

Leahu, H. (2008). Measure-valued differentiations for finite products of measures. Doctoral Thesis, Vrije Universiteit Amsterdam.Google Scholar

L’Ecuyer, P. and Perron, G. (1994). On the convergence rates of IPA and FDC derivative estimators. Operat. Res. 42, 643–656.CrossRef Google Scholar

Lyuu, Y.-D. and Teng, H.-W. (2011). Unbiased and efficient Greeks of financial options. Finance Stoch. 15, 141–181.CrossRef Google Scholar

Peng, Y., Fu, M. C., Hu, J. Q. and Heidergott, B. (2018). A new unbiased stochastic derivative estimator for discontinuous sample performances with structural parameters. Operat. Res. 66, 487–499.CrossRef Google Scholar

Pflug, G. (1992). Gradient estimates for the performance of Markov chains and discrete event processes. Ann. Operat. Res. 39, 173–194.CrossRef Google Scholar

Pflug, G. (1996). Optimisation of Stochastic Models. Kluwer, Boston.CrossRef Google Scholar

Pflug, G. and Rubinstein, R. (2002). Inventory processes: quasi-regenerative property, performance evaluation, and sensitivity estimation via simulation. Stoch. Models 18, 469–496.CrossRef Google Scholar

Rubinstein, R. (1992). Sensitivity analysis of discrete event systems by the ‘push out’ method. Ann. Operat. Res. 39, 229–250.CrossRef Google Scholar

Rubinstein, R. and Shapiro, A. (1993). Discrete Event Systems: Sensitivity Analysis and Optimization by the Score Function Method. John Wiley, Chichester.Google Scholar

Rudin, W. (1964). Principles of Mathematical Analysis. McGraw-Hill, New York.Google Scholar

Rudin, W. (1987). Real and Complex Analysis. McGraw-Hill, New York.Google Scholar

Table 1. Derivative estimates for the expected cash flow in the ruin problem (mean $\pm$ standard error).

Figure 1. Expected cash flow (left) and time to ruin (right) as a function of $ \theta $ for $ r=0.3$.

Article contents

Gradient estimation for smooth stopping criteria

Abstract

Keywords

MSC classification

1. Introduction

2. Differentiation of one-step taboo kernel

2.1. Transition kernels and norms

2.2. Differentiability of the taboo kernel

3. Differentiation of the potential kernel

4. Estimation procedures

5. Applications

6. Conclusion

Acknowledgements

Competing interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests