1. Introduction
In recent years there has been substantial interest in using continuous-time piecewise deterministic Markov processes (PDMPs) as the basis for Markov chain Monte Carlo (MCMC) algorithms. These ideas started in the statistical physics literature [Reference Bernard, Krauth and Wilson2, Reference Krauth22, Reference Peters and de With28] and have led to a number of new MCMC algorithms, such as the bouncy particle sampler (BPS) [Reference Bouchard-Côté, Vollmer and Doucet8], the zigzag (ZZ) process [Reference Bierkens, Fearnhead and Roberts5], and the coordinate sampler (CS) [Reference Wu and Robert33], amongst many others. See [Reference Fearnhead, Bierkens, Pollock and Roberts18] and [Reference Vanetti, Bouchard-Côté, Deligiannidis and Doucet31] for an introduction to the area. One potential benefit which is associated with these samplers are that they are non-reversible, and it is known that non-reversible samplers can mix faster than their reversible counterparts [Reference Bierkens3, Reference Chen, Lovász and Pak9, Reference Diaconis, Holmes and Neal13].
Informally, a PDMP process evolves according to a deterministic flow—defined via an ordinary differential equation—for a random amount of time, before exhibiting an instantaneous transition, and then following a (possible different) deterministic flow for another random amount of time, and so on.
Initial PDMP samplers were defined to sample target distributions which were continuously differentiable ( $C^1$ ) on $\mathbb{R}^d$ , but there is interest in extending them to more general target distributions. To date, this has been achieved for sampling from distributions defined on the union of spaces of different dimensions [Reference Bierkens, Grazzi, van der Meulen and Schauer7, Reference Chevallier, Fearnhead and Sutton10] and for sampling from distributions on restricted domains [Reference Bierkens4], binary distributions [Reference Pakman26], and phylogenetic trees [Reference Koskela21]. Here we rigorously consider a further extension to sampling from target distributions on $\mathbb R^d$ which are piecewise $C^1$ . That is, they can be defined by partitioning $\mathbb{R}^d$ into a countable number of regions, with the target density being $C^1$ on each region. We call such densities piecewise smooth. Such target distributions arise in a range of statistical problems, such as latent threshold models [Reference Nakajima and West24], binary classification [Reference Nishimura, Dunson and Lu25], and changepoint models [Reference Raftery and Akman29]. The importance of this extension of PDMP samplers is also indicated by the usefulness of extensions of Hamiltonian Monte Carlo (HMC) to similar problems [Reference Afshar and Domke1, Reference Dinh, Bilge, Zhang and Matsen14, Reference Pakman and Paninski27, Reference Zhou34].
The challenge with extending PDMP samplers to piecewise smooth densities is the need to specify the appropriate dynamics when the sampler hits a discontinuity in the density. Essentially, we need to specify the dynamics so that the PDMP has our target distribution $\mu$ as its invariant distribution. Current samplers are justified based on considering the infinitesimal generator, denoted by A, of the PDMP. Informally, the generator is an operator that acts on functions and describes how the expectation of that function of the state of the PDMP changes over time. The idea is that if we average the generator applied to a function and this is zero for a large set of functions $f\in \mathcal F$ ,
then the distribution $\mu$ must be the invariant distribution. Whilst intuitively this makes sense, many papers use this intuitive reasoning without giving a formal proof that the distribution they average over is in fact the invariant distribution; see for instance [Reference Fearnhead, Bierkens, Pollock and Roberts18, Reference Vanetti, Bouchard-Côté, Deligiannidis and Doucet31]. One important exception is the work of [Reference Durmus, Guillin and Monmarché16], where precise conditions and functional-analytic proofs are given which establish that smooth, compactly supported functions form a core for the generator. This then rigorously establishes $\mu$ -invariance of the PDMP. In this work, we actually take a different approach from [Reference Durmus, Guillin and Monmarché16]: we make use of the particular nature of the PDMPs under consideration and give conditions under which (1) directly holds for the whole domain $\mathcal{D}(A)$ , then show that a large class of functions $\mathcal F \subset \mathcal{D}(A)$ separate measures, without attempting to show that $\mathcal F$ is a core. This avoids the need for highly technical proofs as in [Reference Durmus, Guillin and Monmarché16].
Furthermore, once we introduce discontinuities, this complicates the definition of the generator. The impact of these discontinuities is realized in terms of the domain of the generator, and this necessitates the use of additional arguments which take account of the impact of the discontinuity when applying arguments based on integration by parts.
More specifically, we can see the challenge of dealing with discontinuities and some of the contributions of this paper by considering the previous work of [Reference Bierkens4], who consider designing PDMP samplers when the target distribution is only compactly supported on $\mathbb R^d$ —a special case of our present work. It should be noted that there is a sign error in the condition given by Equation (4) of [Reference Bierkens4], and furthermore their proof was not complete—see Appendix E. In this work, we will provide a full proof.
The paper is structured as follows. In the next section we give a brief introduction to PDMPs and some common PDMP samplers. Then in Section 3 we give general conditions for checking the invariant distribution of a PDMP. The result in this section formalizes the informal argument used by previous authors. We then develop these results for the specific cases where the PDMP has active boundaries—for example, due to a compact support, or when we wish to sample from a mixture of densities defined on spaces of differing dimensions. The results in this section can be used to address some shortcomings of [Reference Bierkens4] for sampling on a bounded domain, and have been used to justify the reversible jump PDMP algorithm of [Reference Chevallier, Fearnhead and Sutton10]. In Section 5 we use our results to provide a simple sufficient condition on the dynamics for a PDMP to admit a specific piecewise smooth density as its invariant density. Various proofs and technical assumptions for these results are deferred to the appendices. We then show how to construct dynamics at the discontinuities for a range of common PDMP samplers, and empirically compare these samplers on some toy examples—with a view to gaining intuition as to their relative merits, particularly when we wish to sample from high-dimensional piecewise smooth densities. The paper ends with a discussion.
2. PDMP basic properties
2.1. General PDMP construction
For this work, we require a general construction of piecewise deterministic Markov processes (PDMPs) in spaces featuring boundaries. We will follow the construction of Davis in [Reference Davis11, p. 57], and largely make use of the notation therein.
Let K be a countable set, and for $k \in K$ , let $E_k^0$ be an open subset of $\mathbb{R}^{d_k}$ . Let $E^0$ be their disjoint union:
For any $k \in K$ , we have a Lipschitz vector field on $E^0_k$ that induces a flow $\Phi_k(t,z)$ .
In this setting, trajectories may reach the boundaries of the state. Hence we define the entrance and exit boundaries using the flow, $\partial^- E_k^0$ and $\partial^+ E_k^0$ respectively (see Figure 1):
Note that it may be possible for a point to be both an entrance and an exit boundary.
We then have $\partial_1 E_k^0 \;:\!=\; \partial^- E_k^0\backslash \partial^+ E_k^0$ as in [Reference Davis11, p. 57], and also
Finally, the full state space is the disjoint union,
The active boundary (that is, the exit boundary) is then defined as
These are points on the boundary that the deterministic flow can hit.
We will denote the state of a PDMP on E at time t by $Z_t \in E$ . A detailed construction of the PDMP is provided by Davis [Reference Davis11, p. 59], but we provide here a summary of the quantities that define a PDMP $(Z_t)$ :
-
(i) An event rate $\lambda(z)$ , with $z \in E$ . An event occurs in $[t,t+h]$ with probability $\lambda(Z_t) h + o(h)$ .
-
(ii) A jump kernel defined for $z \in E \cup \Gamma$ : $Q(\!\cdot\!|z)$ with $Q(\!\cdot\!|z)$ a probability measure on E. At each event time $T_i$ , the state will change according to the jump kernel: $Z_{T_i} \sim Q(\!\cdot\!|Z_{T_i-})$ .
-
(iii) The deterministic flow $\Phi$ which determines the behaviour of $Z_t$ between jumps.
-
(iv) For any trajectory $Z_t$ such that
\begin{align*} \lim_{t\uparrow t_0} Z_t = Z_{t_0-} \in \Gamma, \end{align*}the state will change according to the jump kernel: $Z_{t_0} \sim Q(\!\cdot\!|Z_{t_0-})$ .
Remark 1. The trajectory never enters $\Gamma$ , which is not in the domain.
2.2. PDMP samplers
In the case of most PDMP samplers, the state space is constructed by augmenting the space of interest with an auxiliary velocity space $\mathcal{V}_k$ : $E_k^0 = U_k \times \mathcal{V}_k$ . The deterministic flow is then typically given by free transport, i.e. $\Phi_k(t,x,v) = (x+tv,v)$ , though other examples exist [Reference Bierkens, Grazzi, Kamatani and Roberts6, Reference Terenin and Thorngren30, Reference Vanetti, Bouchard-Côté, Deligiannidis and Doucet31]. The use of such simple dynamics allows for the exact simulation of the process dynamics, without resorting to numerical discretization.
For the purposes of this work, it will be useful to introduce three of the more popular classes of PDMP sampler, which we will then be able to refer back to as running examples. Each of these processes works on a velocity-augmented state space and follows free-transport dynamics, and so they differ primarily in (i) the set of velocities which they use, and (ii) the nature of the jumps in the process. We describe the dynamics for each process if we wish to sample from a density $\pi(x)$ on $\mathbb R^d$ :
-
1. The bouncy particle sampler [Reference Bouchard-Côté, Vollmer and Doucet8] uses a spherically symmetric velocity space, given by either $\mathbb{R}^d$ equipped with the standard Gaussian measure, or the unit sphere equipped with the uniform surface measure. ‘Bounce’ events occur at rate $\lambda(x, v) = \langle v, -\nabla \log \pi (x) \rangle_+$ , and at such events, the velocity deterministically jumps to $v^{\prime} = \left( I - 2 \frac{\left( \nabla \log \pi (x) \right) \left( \nabla \log \pi (x) \right)^\top}{\left( \nabla \log \pi (x) \right)^\top \left( \nabla \log \pi (x) \right)} \right) v$ , i.e. a specular reflection against the level set of $\log \pi$ at x.
-
2. The zigzag process [Reference Bierkens, Fearnhead and Roberts5] uses $\{ \pm 1 \}^d$ as its velocity space, equipped with the uniform measure. There are now d different types of bounce events, corresponding to the coordinates of the velocity vector. Bounces of type i occur at rate $\lambda_i (x, v) = \left( -v_i \partial_i \log \pi(x) \right)_+$ , and at such events, $v_i$ is deterministically replaced by $-v_i$ .
-
3. The coordinate sampler [Reference Harland, Michel, Kampmann and Kierfeld20, Reference Wu and Robert33] uses $\{ \pm e_i \}_{i = 1}^d$ as its velocity space, equipped with the uniform measure, where $e_i$ is the ith coordinate vector. Bounce events again happen at rate $\lambda(x, v) = \langle v, -\nabla \log \pi (x) \rangle_+$ . At such events, the velocity is resampled from the full velocity space, with probability proportional to $\lambda(x, -v^{\prime})$ .
When $d = 1$ , all of these processes are identical. Additionally, all three processes can be supplemented with ‘refreshment’ events, which occur at a rate independent of v, and modify the velocity in a way which leaves its law invariant. This can include full resampling, autoregressive resampling in the case of spherical velocities, coordinate-wise resampling in the case of velocity laws with independent coordinates, and other variations.
From the above definitions, it is easy to see that the event rates only make sense when $\pi$ is sufficiently smooth, and that in the presence of discontinuities, complications in defining the process will arise. Furthermore, it is not a priori clear which processes will work best in the presence of such discontinuities.
2.3. Review: extended generator and semigroup
We collect some basic definitions and facts which will be crucial for our later results.
Let $\mathcal B(E)$ denote the set of bounded measurable functions $E\to \mathbb R$ . For any $f \in \mathcal B(E)$ , we recall the definition of the semigroup $P_t$ associated to the process $Z_t$ :
where $\mathbb E_z$ is the expectation with respect to $\mathbb P_z$ , with $\mathbb P_z$ the probability such that $\mathbb P_z[Z_0 = z] = 1$ .
Proposition 1. The semigroup $P_t$ is a contraction for the sup norm:
for all $f\in \mathcal{B}(E)$ and $t \geq 0$ .
Proof. See [Reference Davis11, p. 28].
The semigroup is said to be strongly continuous for $f\in \mathcal B(E)$ if $\lim_{t\downarrow 0} \|P_t f-f\|_\infty =0$ . Let $\mathcal B_0$ be the set of functions for which $P_t$ is strongly continuous:
Lemma 1. We have that $\mathcal B_0 \subset \mathcal B(E)$ is a Banach space with sup norm $\|\cdot\|_\infty$ , and $P_t$ maps $\mathcal B_0\to \mathcal B_0$ for any $t \ge 0$ .
Proof. See [Reference Davis11, p. 29].
Let us write $(A, \mathcal D(A))$ for the infinitesimal generator (also referred to as the strong generator) of the semigroup $(P_t)$ . By definition, for all $f\in \mathcal D(A)$ ,
with this limit being taken in $\|\cdot\|_\infty$ , with
Since g in (2) is a limit of functions in a Banach space, if such a g exists, it must be unique, and Af is well-defined.
Lemma 2. Let $f\in \mathcal D(A)$ . Then $Af \in \mathcal{B}_0$ . In other words, $A \;:\; \mathcal D(A) \to \mathcal{B}_0$ .
Proof. This is immediate since $P_t$ maps $\mathcal{B}_0\to \mathcal{B}_0$ .
We now define the extended generator : is the set of (potentially unbounded) measurable functions $f\;:\; E \to \mathbb R$ such that there exists a measurable function $h\;:\; E\to \mathbb R$ with $t \mapsto h(Z_t)$ $\mathbb P_z$ -integrable almost surely for each initial point z, and such that the process
is a local martingale; see [Reference Davis11, (14.16)]. For , , for h as in (3).
Proposition 2. The extended generator is an extension of the infinitesimal generator:
-
1. .
-
2. for any $f\in \mathcal D (A)$ .
Proof. See [Reference Davis11, p. 32].
To simplify notation, we will define the action of our probability kernel, Q, on a function, f, as
We will assume throughout this work that the standard conditions of Davis [Reference Davis11, (24.8)] hold. Under this assumption on PDMPs, are fully characterized in [Reference Davis11, (26.14)]. In particular, the set is entirely known, and for all ,
where $\Xi$ is the differential operator associated to the deterministic flow of the PDMP. The PDMP samplers we are interested in (see Section 2.2) have flow corresponding to free transport, with corresponding $\Xi$ operator for continuously differentiable f,
Remark 2. To reiterate, while the domain of the strong generator $\mathcal D(A)$ is not known, the domain is known and .
3. A general framework for the invariant measure of a PDMP
A challenge in the piecewise smooth setting is that the usual approach to constructing and working with PDMPs does not work without changing the topology. In particular, existing results concerning the invariant measure of such processes require the process to be Feller. For PDMPs with boundaries, this is in fact not the case in general [Reference Davis11, (27.5)].
3.1. Strong continuity of the semigroup
First, we give a general result that is not tied to our specific context and is valid for any PDMP that follows Davis’s construction, [Reference Davis11, Section 24, (24.8)].
Let $\mathcal{F}$ be the space of $C^1$ functions contained in with compact support.
Proposition 3. Assume that the deterministic flow and the jump rates are bounded on any compact set, and that Qf has compact support whenever f has compact support. Then $\mathcal{F} \subset \mathcal{B}_0$ . In other words, the semigroup $P_t$ of the process is strongly continuous on $\mathcal{F}$ :
in $\|\cdot\|_\infty$ , as $t\downarrow 0$ .
Proof. Let $f\in \mathcal F$ . Since
,
is a local martingale. Furthermore, by examining the expression for
, one sees that it can be rewritten as
from (4). Since f and Qf have compact support and are bounded, using the assumptions on $\Xi$ and $\lambda$ , we deduce that
is bounded.
Since f and are bounded, $C_t^f$ is bounded for any fixed t, which implies that it is a martingale. More precisely, consider the stopped process $C^f_{t\wedge T}$ for any $T>0$ . This is a uniformly bounded local martingale, and is hence a true martingale.
We have $C_0^f = 0$ ; hence, for any starting point z and $t>0$ ,
Hence
where we used Fubini’s theorem to swap the integral and the expectation. Since $P_t$ is a contraction for the sup norm, we see that
We thus conclude that $P_t f - f \rightarrow 0$ as $t \downarrow 0$ .
Remark 3. The set $\mathcal{F}$ does not capture every function of $\mathcal{B}_0$ , nor is it invariant under $P_t$ . We will not attempt to prove that $\mathcal{F}$ is a core of the infinitesimal generator.
Recall that a set of functions $\mathcal F_0 \subset \mathcal B(E)$ separates measures if for any probability measures $\mu_1, \mu_2$ on E, $\int f\;\textrm{d} \mu_1=\int f \;\textrm{d} \mu_2$ for all $f\in \mathcal F_0$ implies that $\mu_1=\mu_2$ . In order to study the invariant measure through the semigroup and its effect on functions, it is important to consider sets of functions which separate measures. Therefore, we will now show that $\mathcal{F}$ separates measures on $E^0$ .
Proposition 4. Assume that the jump kernel Q is such that for any $z \in \Gamma$ , the measure $Q(\cdot | z)$ is supported on the boundary $\cup_k \partial^- E^0_k$ . Then $\mathcal{F}$ separates measures on $E^0$ .
Proof. Consider the set of $C^1$ functions $f\;:\;E \to \mathbb R$ which are compactly supported on each open set $E_k^0$ . The collection of such functions separates measures on $E^0$ .
We will show that such functions belong to , and hence to $\mathcal{F}$ , by using the explicit characterization of from [Reference Davis11, Theorem 26.14].
Firstly, if f is $C^1$ with compact support, Conditions 1 and 3 of [Reference Davis11, Theorem 26.14] are automatically satisfied.
It remains to check the boundary condition:
However, since we are considering only functions f which are compactly supported on each $E_k^0$ , it holds that $f(z)=0$ for any $z \in \Gamma$ on the boundary. Recalling that $Q(\cdot | z)$ is supported only on the boundary, we have that $Qf(z)=0$ also for any $z \in \Gamma$ . It hence follows that the boundary condition is satisfied.
Remark 4. The assumption of Proposition 4 is restrictive and does not cover, for example, the case of variable selection described in [Reference Chevallier, Fearnhead and Sutton10]. However, it is possible to weaken the assumption to cover the variable selection case. A sketch is available in Appendix A.
3.2. Invariant measure
We now turn to giving conditions for the invariant measure of our PDMP. The following lemma will be important within the proof, as it allows us to ignore contributions from the boundary when calculating expectations over the path of the PDMP.
Lemma 3. For all $z\in E$ , the process starting from z spends a negligible amount of time on the boundary: for any $t>0$ ,
$\mathbb{P}_z$ -almost surely.
Proof. In Davis’s construction, the number of events, including jumps at the boundary, is countable for every trajectory. Hence, for every trajectory of the process, the set of times for which $Z_t \in E\setminus E^0$ is countable and therefore negligible.
Theorem 1. Let $\mu$ be a measure on E. Assume the following conditions hold:
-
1. The vector field of the deterministic flow and the jump rates are bounded on any compact set.
-
2. Qf has compact support whenever f has compact support.
-
3. For any $z \in \Gamma$ , the measure $Q(\!\cdot\!| z)$ is supported on the boundary $\cup_k \partial^- E^0_k$ .
-
4. We have $\mu(E\setminus E^0) = 0$ .
-
5. For all $f \in \mathcal{D}(A)$ , $\int_E Af \;\textrm{d}\mu = 0$ .
Then $\mu$ is invariant.
Proof. Using this fact and Proposition 1.5 from Ethier and Kurtz [Reference Ethier and Kurtz17] (or [Reference Davis11, (14.10)]), we note that for any $f \in \mathcal{B}_0$ and $t > 0$ , we have that $\int_0^t P_s f \;\textrm{d} s \in \mathcal{D}(A)$ , the domain of the strong generator. We also have that
where we have used our assumption that $\int Ag \;\textrm{d} \mu=0$ for any $g \in \mathcal D (A)$ and taken $g = \int_0^t P_s f\;\textrm{d} s$ .
Let $f_B(z) \;:\!=\; f(z) 1_{z\in E\setminus E^0}$ and $f_0(z) \;:\!=\; f(z) 1_{z\in E^0}$ be a decomposition of f with $f = f_0 + f_B$ . Let $1_B(z) = 1_{z\in E\setminus E^0}$ be the indicator function of the boundary $E\setminus E^0$ .
From Lemma 3, $\int_0^t P_s 1_B(z) \;\textrm{d} s = 0$ for all $z\in E$ . Hence we have that $\int_E \big[ \int_0^t P_s 1_B(z) \;\textrm{d} s \big] \;\textrm{d} \mu(z) = 0$ , and by using Fubini’s theorem, that
By the non-negativity of $P_t 1_B$ , there exists a null set $\mathcal{N} \subset \mathbb{R}^+$ such that for all $t\in \mathbb{R}^+\setminus \mathcal{N}$ ,
For all z, $f_B(z) \leq \|f\|_\infty 1_B(z)$ , so for all $t \notin \mathcal N$ , $\int_E\, P_t\; f_B \;\textrm{d} \mu = 0$ . Hence $\int_E\, P_t\, f \;\textrm{d} \mu = \int_E\, P_t\, f_0 \;\textrm{d}\mu$ for all $t \notin \mathcal N$ . Since $\mu$ is supported on $E^0$ , $\int_E\, f \;\textrm{d} \mu = \int_E\, f_0 \;\textrm{d} \mu$ and we deduce using (6) that for all $t\notin \mathcal N$ ,
Let $\mu_t$ be the law $Z_t$ with $Z_0 \sim \mu$ . Let $\mu_t^0$ and $\mu_t^B$ be the measures defined by $\mu_t^0(A) = \mu_t(A\cap E^0)$ and $\mu_t^B(A) = \mu_t(A\cap (E\setminus E^0))$ . Using (7), for all $t\notin \mathcal N$ ,
Since $\mathcal F$ separates measures on $E_0$ by Proposition 4, $\mu_t^0 = \mu^0$ for all $t\notin \mathcal N$ . Furthermore, $\mu(E) = \mu_t(E)$ and $\mu(E) = \mu^0(E)$ ; hence $\mu_t^0(E) = \mu_t(E)$ and $\mu_t^B(E) = 0$ . Thus $\mu_t = \mu_t^0 = \mu^0 = \mu$ for all $t\notin \mathcal N$ .
Let $t_1,t_2\notin \mathcal N$ . Then $\mu_{t_1} = \mu_{t_2} = \mu$ , and for all functions g which are measurable and bounded, it holds that
Hence $\mu_{t_1+t_2} = \mu$ . To conclude, since $\mathcal N$ is a null set, for all $ t > 0$ , there exist $t_1,t_2 \notin \mathcal N$ such that $t = t_1 + t_2$ , and therefore $\mu_t = \mu$ .
4. PDMP samplers with active boundaries
Let $U_k$ be an open set of $\mathbb{R}^{d_k}$ for all $k \in K$ , and let $\mathcal{V}^k \subset \mathbb{R}^{d_k}$ be the velocity space associated to the PDMP sampler on $U_k$ . We consider the state space defined following the construction of Davis described in Section 2.1: first, set
Remark 5. We do not take $E_k^0 = U_k\times \mathcal{V}^k$ because $E_k^0$ must be open and $\mathcal{V}^k$ , the set of velocities of the PDMP sampler, might not be.
Let $\pi$ be a measure on the disjoint union $\cup_k U_k$ with a density $\pi_k$ on each $U_k$ , where $\pi_k$ can be extended continuously to the closure $\bar{U}_k$ . Let $p_k$ be the marginal velocity probability distribution on $\mathbb{R}^{d_k}$ with support on $\mathcal{V}^k$ , and let $\mu(k,x,v) = \pi_k(x) p_k(v)$ be a density on $\cup_{k\in K} U_k \times \mathcal{V}^k$ with respect to a suitable dominating measure (which will have discrete components when $\mathcal V^k$ is a finite set), defining a measure on E.
The core result of this section relies on integration by parts, and as such requires extra assumptions on the sets $U_k$ . For clarity of exposition, we give here an intuitive version of the required assumptions; a detailed version can be found in Assumptions 3 and 4 in the appendix.
Assumption 1. Assumptions 3 and 4 can be informally described as follows:
-
(i) $U_k$ has no interior discontinuities on a $(d_k -1)$ -dimensional subset.
-
(ii) The boundary $\partial U_k$ can be decomposed into a finite union of smooth parts, on each of which the normal is well-defined.
-
(iii) The set of corner points, on which the normals of the boundary are not defined, is negligible in an appropriate sense.
-
(iv) For each $x\in U_k$ , $v\in \mathcal{V}^k$ , there is a finite number of intersections between each line $x+\mathbb{R}v$ and $\partial U_k$ .
-
(v) For each $v \in \mathcal{V}^k$ , the projection of the points on the boundary, which are tangent to the velocity, onto $H_v = \mathrm{span}(v)^{\perp}$ is negligible in an appropriate sense.
Let $N_k$ be the subset of points x on the boundary $\partial U_k$ where the normal n(x) is properly defined (see (10) of the appendix for a precise statement).
Assumption 2.
-
(i) We have $\int |\lambda(z)| \;\textrm{d} \mu < \infty$ .
-
(ii) For all $k\in K$ , $\pi_k$ is $C^1$ in $\bar{U}_k$ .
-
(iii) For any $k \in K$ , and any $v \in \mathcal{V}^k$ , $\nabla \pi_k \cdot v$ is in $L_1(\mathrm{Leb})$ .
Theorem 2. If Assumptions 2, 3, and 4 hold, then for all $f\in \mathcal D(A)$ ,
where f(k, x, v) is defined as $f(k,x,v) \;:\!=\; \lim_{t\downarrow 0} f(k,x-tv,v)$ , for $x\in \partial U_k \cap N_k,v\in \mathbb{R}^{d_k}$ such that $\langle n(x),v\rangle > 0$ , and $\sigma$ is the Lebesgue measure induced on the surface $\partial U_k$ .
Proof (sketch). The outline of the proof is that we first use the definition of the generator acting on a function Af and then rearrange the resulting integral using integration by parts. If our PDMP has no boundary, this will give just the first two terms on the right-hand side [Reference Fearnhead, Bierkens, Pollock and Roberts18, Reference Vanetti, Bouchard-Côté, Deligiannidis and Doucet31]. The effect of the boundaries is to introduce the additional terms when performing integration by parts. For full details of the proof, see Section B.2.
Corollary 1. If Assumptions 2, 3, and 4 hold, then for all $f\in \mathcal D(A)$ ,
Proof. Plugging the boundary condition (5) that must be satisfied by any $f\in\mathcal D(A)$ into the result of Theorem 2 yields the result.
To show that a PDMP has the correct invariant distribution, we need to show that $\int_E Af \;\textrm{d} \mu = 0$ . It is difficult to give simple criteria for this in full generality, so in what follows we will focus on samplers where the position is unchanged at the boundary. This is consistent with the dynamics of current PDMP samplers where trajectories are continuous, and events of the PDMP only change the velocity. A natural approach for constructing other sampling behaviour at the boundary is to work with samplers for which $f(k,x,v) \nabla_x \mu \cdot v \;\textrm{d} x \;\textrm{d} v +\int_{E^0} \lambda(z) [Qf(z) - f(z)] \;\textrm{d} \mu = 0 $ . In this case, we only need to verify that the dynamics at the boundary are such that the remaining term in (8) simplifies to 0: see [Reference Grazzi19] for a condition for a class of moves on the boundary for which this term vanishes. In more generality, such as for the variable selection sampler of [Reference Chevallier, Fearnhead and Sutton10], we can also introduce additional events into the sampler to compensate for the boundary term.
5. PDMP samplers for piecewise continuous densities
Let $\pi$ be a density on $\mathbb{R}^d$ , and let $\{U_k\;:\;k\in K\}$ be a finite collection of disjoint open subsets of $\mathbb{R}^d$ , such that $\overline{\cup_k U_k}=\mathbb R^d$ , which satisfy our technical Assumptions 3 and 4. We assume that $\pi$ is $C^1$ on each $\bar{U}_k$ . We are now in the same setting as in the previous section.
Let $\partial U = \bigcup_{k\in K} (\partial U_k \cap N_k)$ be the union of the boundaries, i.e. the set of discontinuities of $\pi$ . We consider now only points on the boundary where exactly two sets $U_{k_1}, U_{k_2}$ intersect, and where the respective normals are well-defined; the set of points where more sets intersect or the normal is ill-defined form a null set by assumption and thus have no impact on the resulting invariant distribution.
In the following we will restrict ourselves to transition kernels on the boundary that keep the location, x, unchanged and only update the velocity, v.
For each such x in $\partial U$ , there exist $k_1(x)$ and $k_2(x)$ such that $x \in \bar{U}_{k_1}$ and $x \in \bar{U}_{k_2}$ . We will define the ordering of the labels so that $\pi_{k_1}(x)<\pi_{k_2}(x)$ . Let n(x) be the outer normal for $U_{k_1}$ . Thus this is the normal that points to the region $k_2(x)$ , which is the region that has higher density, under $\pi$ , at x.
Let $\mathcal{V}_x^+ = \{v \in \mathcal{V} | \langle v,n(x) \rangle > 0\}$ and $\mathcal{V}_x^- = \{v \in \mathcal{V} | \langle v,n(x) \rangle < 0\}$ . Thus $\mathcal{V}_x^+$ is the set of velocities that would move the position x into $k_2(x)$ , thereby increasing the density under $\pi$ , and $\mathcal{V}_x^-$ is the set of velocities that move the position into $k_1(x)$ .
For $x\in \partial U$ , let $l_x$ be the following (unnormalized) density on $\mathcal{V}$ :
This is just proportional to the density p(v) weighted by the size of the velocity in the direction of the normal n(x) and weighted by the density at x in the region, either $k_1(x)$ or $k_2(x)$ , that the velocity is moving toward.
Consider the transition $(x, v) \to (x', v^{\prime})$ obtained by first flipping the velocity and then applying the Markov kernel Q. Since we assume that Q only changes the velocity, there exists a Markov kernel $Q'_{\!\!x}$ such that this transition can be described as $\delta_x( \;\textrm{d} x' ) Q'_{\!\!x} ( \;\textrm{d} v^{\prime} | v)$ . One can equally define this kernel by $Q'_{\!\!x} (A | v) = Q( \{x\} \times A | x, -v)$ for any measurable A.
Theorem 3. Assume for all $v\in \mathcal{V}_x$ that $p(v)=p(\!-\!v)$ , and that the transition kernel Q only changes the velocity, and define the family of kernels $Q'_{\!\!x}$ for $x\in \partial U$ as above. Furthermore, assume that
and that
Then $\int_E Af \;\textrm{d} \mu = 0$ for all $f \in \mathcal{D}(A)$ , and $\mu$ is the invariant distribution of the process.
Proof. Starting from Theorem 2, for all $f\in \mathcal{D}(A)$ ,
By assumption, $-\int_{E^0} f(k,x,v) \nabla_x \mu \cdot v \;\textrm{d} x \;\textrm{d} v +\int_{E^0} \lambda(z) [Qf(z) - f(z)] \;\textrm{d}\mu = 0$ , and so we simplify the integral of Af to
To simplify notation, in the rest of the proof we write $k_1$ for $k_1(x)$ and $k_2$ for $k_2(x)$ . We rewrite the previous equation as follows:
Using the fact that if $v\in \mathcal{V}_x^-$ then $-v \in \mathcal{V}_x^+$ , we can rewrite the right-hand side as
A sufficient condition for $\int_E Af \;\textrm{d} \mu = 0$ is that for all x the integral over v in the brackets is 0.
Any $f \in \mathcal{D}(A)$ satisfies the boundary condition (5) on $\Gamma$ . For $v \in \mathcal{V}^+_x$ , we have $(k_1,x,v) \in \Gamma$ and $(k_2,x,-v) \in \Gamma$ ; hence,
Thus our sufficient condition for $\int_E Af \;\textrm{d} \mu = 0$ becomes
Using again the fact that if $v\in \mathcal{V}_x^+$ then $-v \in \mathcal{V}_x^-$ , this condition can be rewritten as follows:
We can then write this in terms of $l_x$ and $Q'_{\!\!x}$ by introducing a function f ′(x, v) that is defined as
Then, using $p(v)=p(\!-\!v)$ and the definitions of $l_x$ and $Q'_{\!\!x}$ , our sufficient condition becomes
This is true if $l_x$ is an invariant density of $Q'_{\!\!x}$ .
6. Boundary kernels for usual PDMP samplers
We give here possible Markov kernels for the bouncy particle sampler, the zigzag sampler, and the coordinate sampler. Since the condition of Theorem 3 only depends on the velocity distribution, any two processes that share the same velocity distribution can use the same boundary Markov kernels. We present two approaches to constructing appropriate kernels on the boundary.
6.1. Sampling l using Metropolis–Hastings
Recall from Theorem 3 that a valid kernel for the velocity when we hit a boundary can be constructed as follows: first, construct a transition kernel $Q'_{\!\!x}$ which leaves $l_x$ invariant; then, if the current velocity is v, simulate a new velocity from $Q'_{\!\!x}(\!\cdot\!|-v)$ . That is, we can simulate a new velocity by (i) flipping the velocity and (ii) applying $Q'_{\!\!x}$ .
The simplest choice of $Q'_{\!\!x}$ is just the identity map, i.e. a kernel that keeps the velocity. However, this would correspond to a transition kernel on the boundary which simply flips the velocity, thus forcing the PDMP to retrace its steps. Where possible, we can improve on this by choosing $Q'_{\!\!x}$ to be a kernel which samples from $l_x$ , though it should be noted that this choice may be difficult to implement.
When $\mathcal{V}$ is bounded, an alternative is to define $Q'_{\!\!x}$ as a Metropolis–Hastings kernel [Reference Dunson and Johndrow15] targeting $l_x$ , with proposals from a uniform sampler of $\mathcal{V}$ . The algorithm starting from v then proceeds as follows:
-
1. Sample v ′ uniformly in $\mathcal{V}$ .
-
2. Accept $v^*=v^{\prime}$ with probability $\alpha = \frac{l_x(v^{\prime})}{l_x(v)}$ ; otherwise set $v^*=v$ .
Of course, it is also possible to iterate the Metropolis–Hastings kernel several times to get a good sample from $l_x$ at a reasonable cost.
6.2. Limiting behaviours
A natural strategy for constructing the transition kernel for the velocity at a boundary is to consider the limiting behaviour of the sampler for a family of continuous densities which tend to a piecewise discontinuous density in an appropriate limit. We will do this for a density $\pi$ with one discontinuity on a hyperplane with normal n: $\pi(x) = c_0 1_{\langle x,n\rangle < 0} + c_1 1_{\langle x,n\rangle {\geq} 0}$ with $c_0 < c_1$ . In the following we will assume $c_0>0$ , but the extension of the arguments to the case $c_0=0$ is straightforward.
We can approximate $\pi$ by a continuous density $\pi_k$ such that $\nabla \log(\pi_k)$ is piecewise constant:
where $C=\log(c_1/c_0)$ . As $k\rightarrow \infty$ we can see that $\pi_k$ converges to $\pi$ . In the following we will call the region where $\langle x,n\rangle \in [\!-\!{C}/k,0]$ the boundary region of $\pi_k$ , as this is approximating the boundary defined by the discontinuity in $\pi$ .
The advantage of using the densities $\pi_k$ is that the resulting behaviour of standard PDMP samplers is tractable, and, as we will see, the distribution of the change in velocity from entering to exiting the boundary region of $\pi_k$ will not depend on k. The choice of k does affect the change in position of the PDMP as it moves through the boundary region. The effect of increasing k is just to reduce the time spent passing through the boundary region. In the limit as $k\rightarrow\infty$ this becomes instantaneous, and the PDMP’s position will be unchanged.
We consider this limiting behaviour for the bouncy particle sampler, the coordinate sampler, and the zigzag process. Whilst we derive the transition distribution for the velocity in each case from this limiting behaviour, we will demonstrate that each distribution is valid for the corresponding sampler directly by showing that it satisfies our condition (9). The proofs of the propositions in this section are deferred to Appendix D.
6.2.1. Limiting behaviour of the bouncy particle sampler.
Consider the bouncy particle sampler dynamics for sampling from $\pi_k$ for a trajectory that enters the boundary region, and ignore any refresh events. If the state of the bouncy particle sampler is (x, v), then dynamics are such that events occur at a rate $\max\{0,-\langle v, \nabla \log \pi_k(x) \rangle\}$ , and at an event the velocity is reflected in $\nabla \log \pi_k(x)$ . Whilst in the boundary region, $\nabla \log \pi_k(x) = k n$ .
For any v such that $\langle v,n\rangle > 0$ , it is clear that $\lambda(x,v) = 0$ for all x. Hence, the trajectory through the boundary region will be a straight line.
Let v be such that $\langle v,n\rangle < 0$ . Without loss of generality assume that the trajectory enters the boundary region at $t=0$ with $\langle x_0, n \rangle = 0$ . If no jumps occur, the trajectory will exit the boundary region at some time $t_e$ , where $\langle x_{t_{e}},n\rangle = -C/k$ , which implies $t_{e} = -C/(k \langle v,n\rangle)$ . For such a trajectory, the Poisson rate whilst passing through the boundary region is $\lambda = -k \langle v,n\rangle$ . Remembering that $C=\log(c_1/c_0)$ , we have that the probability of a trajectory passing through the region in a straight line is
which does not depend on k.
Finally, the probability of an event that changes the velocity in the boundary region is thus $1-c_0/c_1$ . If there is an event, then the velocity reflects in the normal and becomes $v^{\prime}=v - 2\langle v,n\rangle$ . As $\langle v^{\prime},n\rangle>0$ , no further events will occur whilst passing through the boundary region.
Hence the probability transition kernel assigns probabilities
If we translate this into the corresponding transition kernel at a general discontinuity, for a trajectory that hits the boundary defined by the continuity at a point with unit normal $n=n(x)$ , then
That is, if the trajectory is moving to the region of lower probability density, then it passes through the discontinuity with a probability proportional to the ratio in the probability densities. Otherwise, it reflects off the surface of discontinuity.
Proposition 5. The transition kernel of the velocity, $Q'_{\!\!x}$ , derived from $Q_{\textit{BPS}}$ satisfies (9).
This result holds for either implementation of the bouncy particle sampler, i.e. where the distribution of the velocity is uniform on the sphere, or is an isotropic multivariate Gaussian. Examination of the proof of the proposition shows that we only require that p(v) is spherically symmetric.
6.2.2. Limiting behaviour of the coordinate sampler.
In the coordinate sampler, the velocity is always in the direction of one of the coordinates of x, and we will denote the set of 2d possible velocities by $\mathcal{V}$ . The dynamics of the coordinate sampler are similar to those of the bouncy particle sampler except that the transition kernel at an event is different. At an event, the probability of the new velocity being $v^{\prime}\in\mathcal{V}$ is proportional to $\max\{0,\langle v^{\prime},\nabla \log \pi_k(x)\rangle\}$ .
The calculations for the transition kernel of the coordinate sampler for a trajectory that enters the boundary region of $\pi_k$ is similar to that for the bouncy particle sampler, except that, if there is an event, the distribution of the new velocity changes to that for the coordinate sampler.
The resulting probability transition kernel, expressed for a general discontinuity, is
where $K=\sum_{v\in\mathcal{V}_x^+} \langle v,n \rangle$ is a normalizing constant for the distribution of the new velocity if it changes.
That is, a trajectory moving to the higher-probability region is unaffected by the discontinuity. A trajectory moving to a lower-probability region either passes through the discontinuity or bounces. The bounce direction is chosen at random from $v^{\prime} \in \mathcal{V}_x^+$ with probability equal to the component of v ′ in the direction of the normal at the discontinuity, n.
Proposition 6. The transition kernel of the velocity, $Q'_{\!\!x}$ , derived from $Q_{\mathrm{CS}}$ satisfies (9).
6.2.3. Limiting behaviour of the zigzag sampler.
For the zigzag process the velocities are of the form $\{\pm1\}^d$ . Given the positions, events occur independently for each component. That is, if $v_i\in\{\pm1\}$ is the component of the velocity in the ith coordinate axis, then this velocity will flip, i.e. change sign, at a rate $\max\{0,-v_i\partial \log \pi_k(x)/\partial x_i\}$ . For the boundary region of $\pi_k(x)$ we have
where $n_i$ is the ith component of the normal n.
If we consider the dynamics of zigzag once it enters the boundary region of $\pi_k(x)$ , then each velocity component with $v_in_i<0$ will potentially flip. Whilst travelling through the boundary region, the rate at which such a $v_i$ will flip will be $-v_i n_i k$ . If a component flips, then the new velocity satisfies $v^{\prime}_{\!\!i}=-v_i$ , and hence $v^{\prime}_{\!\!i} n_i>0$ and the rate of any future flip while in the boundary region will be 0. Thus, each component will flip at most once whilst the trajectory passes through the boundary region.
The resulting dynamics are somewhat complicated, but can easily be simulated using the following algorithm:
-
(a) For $i=1,\ldots,d$ simulate $\tau_i$ as independent realizations of an exponential random variable with rate $k \max\{-n_iv_i,0\}$ . If $n_iv_i\geq 0$ then set $\tau_i=\infty$ .
-
(b) Calculate the time $t^*$ at which we leave the boundary as the smallest value $t>0$ for which
\begin{equation*} \sum_{i=1}^n v_in_i {(\tau_i-|t-\tau_i|)} = \pm C/k, \end{equation*}or\begin{equation*} \sum_{i=1}^n v_in_i {(\tau_i-|t-\tau_i|)} =0. \end{equation*} -
(c) The new velocity has $v^{\prime}_i=v_i$ if $\tau_i>t^*$ and $v^{\prime}_i=-v_i$ otherwise.
The key idea is that whilst the trajectory remains within the boundary region, each velocity component flips independently with its own rate. Step (a) then simulates the time at which each component of the velocity would switch. Step (b) calculates, for the event times simulated in 1, the time $t^*$ at which the trajectory will leave the boundary. There are two possibilities, the first corresponding to passing through the boundary, the second to bouncing back to the region where we started. For the first of these possibilities we have two options, corresponding to $C/k$ and $-C/k$ , to allow for the two possible directions with which we can enter the boundary region. Then in Step (c) we calculate the velocity once the trajectory exits the boundary region—using the fact that a velocity component will have flipped if and only if $\tau_i<t^*$ .
It is simple to show that the distribution of the new velocity, v ′, simulated in Step (c) is independent of k, as the value of k only introduces a scale factor into the definition of the event times $\tau_i$ and the exit time $t^*$ . Thus for a general discontinuity, we define the probability transition kernel, $Q_{\mathrm{ZZ}}$ , as corresponding to the above algorithm with $k=1$ , with $n=n(x)$ and $C=\log(\pi_{k_2(x)}(x)/\pi_{k_1(x)}(x))$ , the log of the ratio in probability density at x for the two regions.
Proposition 7. The transition kernel of the velocity $Q'_{\!\!x}$ induced by $Q_{\mathrm{ZZ}}$ satisfies (9).
6.3. Bouncy particle sampler with reflection and refraction
The bouncy particle sampler can be implemented using a Gaussian velocity distribution, similarly to Hamiltonian Monte Carlo. In that case, it is possible to use the boundary kernel developed for Hamiltonian Monte Carlo which reflects and refracts on the boundary [Reference Afshar and Domke1]. This kernel is fully deterministic and is as follows. As before, let $k_1$ and $k_2$ denote the regions on either side of the discontinuity, with the density being higher in the region $k_2$ , and let n be the normal that points to $k_2$ . Let $\Delta U = \log\!(\pi_{k_2}(x)) - \log(\pi_{k_1}(x))>0$ be the (positive) difference in log density between the two sides of the boundary. If $\langle v,n\rangle < 0 $ and $\langle v,n \rangle^2 > 2 \Delta U$ , then the process crosses the boundary and the outgoing velocity is $v^{\prime} =v+ \big(\sqrt{\langle v,n \rangle^2 - 2\Delta U} - \langle v,n\rangle\big) n $ (refraction). Otherwise, if $\langle v,n\rangle < 0 $ and $\langle v,n \rangle^2 < 2 \Delta U$ , the boundary is not crossed and the output velocity is $v^{\prime} = v - 2 \langle v,n\rangle n$ (reflection). If $\langle v,n\rangle >0 $ then, as we are moving to the region with higher density, we always refract, and the new velocity is $v^{\prime}= v+\big(\sqrt{\langle v,n \rangle^2 + 2\Delta U} - \langle v,n\rangle \big) n $ .
The idea of the refraction is that only the velocity in the direction of the normal, n, is changed. If the PDMP is moving to a region with higher density, then the velocity in the direction of n is increased; if it is moving fast enough into a region with lower density, it is reduced.
Let $Q_{\text{RR}}$ be the probability kernel defined by the reflection and refraction process. The following result gives the validity of this kernel.
Proposition 8. The transition kernel of the velocity, $Q'_{\!\!x}$ induced by $Q_{\textit{RR}}$ satisfies (9).
We give examples of using this boundary kernel in Figure 3 and Appendix F.
7. Comparison of samplers
We now present some simulation results that aim to illustrate the theory, and show how different samplers and different choices of kernel at the discontinuities behave. We do this by considering a simple model for which it is easy to see the boundary behaviour, with a target density
which is Gaussian inside and outside the hypercube $[\!-\!1,1]^d$ , with a discontinuous boundary on the hypercube.
For algorithms such as the zigzag process and the coordinate sampler, the choice of basis is extremely important. In particular, we expect the zigzag process to perform very well for product measures if the velocity basis is properly chosen. Hence we use a rotated basis where we generate a random rotation matrix R and rotate the canonical basis by R. (For results for zigzag with the canonical basis, see Appendix F.) The random matrix R is obtained by computing the polar decomposition of a random matrix composed of independent and identically distributed standard normal random variables.
Since the goal of the experiments is to highlight the boundary behaviour, and not a general comparison between the bouncy particle sampler, the zigzag process, and the coordinate sampler, we only perform basic tuning of these algorithms, in particular with respect to the refresh rate which is necessary for the bouncy particle sampler to be ergodic (without boundaries). For each sampler we consider a range of transition kernels at the discontinuity. These are the Metropolis–Hastings kernel of Section 6.1; using 1 or 100 iterations of the Metropolis–Hastings kernel; and using the kernel derived from the limiting behaviour in Sections 6.2.1–6.2.3. We have implemented all methods in dimensions $d=2$ , 10, and 100, though we only present results for $d=100$ here in Figure 2, with the full results shown in Appendix F.
An example of the resulting trajectories for $d=100$ , for the case of a Gaussian restricted to the cube, i.e. $\alpha_{\mathrm{out}}=0$ , can be found in Figure 2. Additional trajectory examples can be found in the appendix for other dimensions in Figures 4–7. There are a number of obvious qualitative conclusions that can be drawn. First, using a single Metropolis–Hastings kernel leads to poor exploration for these examples—the trajectories often double back on themselves when they hit the boundary, and the trajectories for all three algorithms explore only a small part of the sample space. Increasing the number of Metropolis–Hastings kernels qualitatively improves exploration noticeably, but does introduce diffusive-like behaviour. For the bouncy particle sampler and the zigzag process, the kernel derived from the limiting behaviour allows for smaller changes in the velocity at the boundary. We see this as the trajectories look qualitatively different from the Metropolis kernels, with the diffusive behaviour being suppressed. Overall the bouncy particle sampler with the limiting kernel appears to mix best—though this may in part be because this sampler is known to mix well for low-dimensional summaries of the target, but less well for global properties [Reference Deligiannidis, Paulin, Bouchard-Côté and Doucet12].
When the density $\pi$ is a product, i.e. the basis used in the zigzag process is not rotated by a random matrix R, the zigzag process does not display diffusive behaviour, as can be seen in Figure 7 in Appendix F.
Finally, while the case $\alpha_{\mathrm{out}} = 0$ allows a clear visualization of the trajectories, we provide simulations for the general case $\alpha_{\mathrm{out}} \neq 0$ in Figure 3, and in Figures 8–10 of Appendix F. Figures 9 and 10 provide example trajectories in dimension 20, while we verify in Figures 3 and 8 that the integral with respect to a test function converges to the true value to show the validity of our boundary kernels, using a random orthogonal basis and the canonical basis, respectively. It is clear the variability is worse for Metropolis compared to limiting behaviours. In dimension 20 with a rotated basis, the zigzag process with the limiting behaviour seems better overall, but the simulation of the boundary kernel is significantly more difficult than for the other processes. To make a fair comparison between kernels, we would need to rescale the time axis, taking into account the complexity of each kernel.
8. Discussion
This paper focuses on PDMP-based MCMC samplers to sample densities which are only piecewise smooth. In particular, we presented a general framework for showing invariance of a given target, and then specialize to the case of the common PDMP samplers, namely the bouncy particle sampler, coordinate sampler, and zigzag sampler, when the target is piecewise smooth. Our general framework avoids the general functional-analytic approach of establishing that a given set of functions is a core [Reference Durmus, Guillin and Monmarché16, Reference Ethier and Kurtz17]. Rather, we make use of specific properties of the PDMP processes which we are interested in.
Since we do not use the functional-analytic and operator-theoretic framework of [Reference Durmus, Guillin and Monmarché16], we lose some of their powerful results, such as the existence of a tractable core for the generator and useful perturbation results [Reference Durmus, Guillin and Monmarché16, Section 10]. However, our conditions are arguably more transparent and our general approach is considerably less technically demanding.
When the target $\pi$ possesses discontinuities, we find that PDMP-based samplers display a surprisingly rich set of behaviours at the boundary, as evidenced by our empirical results, which demonstrate that the choice of jump kernel at the boundary is crucial. We see that the limiting kernels compare favourably to Metropolis–Hastings-based jump kernels.
Ideas from [Reference Michel, Durmus and Sénécal23] for designing the distribution of the change in velocity at a jump event for standard PDMP samplers may possibly be adapted to our boundary kernels to improve their mixing properties, as there is an equivalence between the densities $l_x$ that we introduce in Theorem 3 and the densities that are left-invariant at jumps of a PDMP (see [Reference Michel, Durmus and Sénécal23]).
There remain several avenues for future exploration. We believe it is possible to weaken Assumptions 1 and 2. For example Assumption 1(ii) could be relaxed to allow for a countable union of smooth parts; it should be possible to remove Assumption 1(v) using Sard’s theorem; and Assumption 2(ii) could be relaxed to the following: for all $x\in U_k$ , for all $v\in \mathcal{V}^k$ , $t\rightarrow \pi_k(x+tv)$ is absolutely continuous. Our chosen set of Assumptions 3 and 4 are sufficient to allow an application of integration by parts, but a simpler and more transparent set of sufficient assumptions would also be desirable. Finally, we conjecture that non-local moves into PDMP samplers, for example based on [Reference Wang, Pollock, Roberts and Steinsaltz32] or otherwise, might also be useful in boosting convergence in the presence of significant discontinuities.
Appendix A. Weakened assumptions for Proposition 4
As mentioned in Remark 4, the assumption of Proposition 4 can be weakened to the following: for any compact $K_1$ , there exists a compact $K_2$ such that $K_1 \subset K_2$ and for any $z\in K_2^c \cap \Gamma$ , $Q(K_2,z) = 0$ . In other words, it is impossible to jump from the boundary outside of $K_2$ to the inside of $K_2$ .
For the sake of brevity, we only give an outline of the proof:
-
1. Let $g_0$ be a function with compact support. The boundary condition (5) is not satisfied.
-
2. For any $\alpha >0$ , build a sequence $g^\alpha_{i+1}$ such that $g^\alpha_{i+1}$ is $C^1$ , $g^\alpha_{i+1} = f$ on the domain of f and $g^\alpha_{i+1} = Qg^\alpha_i$ on $\Gamma$ , and $g^\alpha_{i+1}(z) = g^\alpha_i(z)$ if $d(z,\Gamma) > \alpha$ .
-
3. Show that this sequence converges to $g_\alpha$ , and that $g_\alpha$ is $C^1$ with compact support and satisfies the boundary condition (5), and hence is in $\mathcal{F}$ . Furthermore, $g_\alpha \rightarrow f$ pointwise when $\alpha \rightarrow 0$ .
-
4. Use dominated convergence to conclude.
This covers the case of variable selection.
Appendix B. Theorem 2
B.1 Precise assumptions
Assumption 3. For each $k \in K$ we assume the following:
-
(i) We have $\dim_H((\overline{U_k})^{\mathrm{o}}\setminus U_k) \leq d_k - 2$ , where $\dim_H$ is the Hausdorff dimension.
-
(ii) There is a finite collection $\{W_1^k,\dots ,W_l^k\}$ of disjoint open sets, and a second collection of disjoint open sets, $\{\Omega_1^k,\dots ,\Omega_l^k\}$ , such that each $W_i^k, \Omega_i^k \subset \mathbb R^{d_k-1}$ with $W_i^k \subset \overline{ W_i^k} \subset \Omega_i^k$ for each $i=1,\dots l$ .
-
(iii) The boundaries satisfy $\dim \partial W_i^k \le d_k-2$ .
-
(iv) We have $C^1$ (injective) embeddings $\phi_i^k \;:\;\Omega_i^k \to \mathbb R^{d_k}$ , and also have continuous normals $n_i\;:\;\Omega_i^k \to S^{d_k-1}$ .
-
(v) Set $M_i \;:\!=\; \phi(\overline{ W_i^k})$ , for each $i=1,\dots ,l$ . Then we have $\partial U_k = M_1 \cup \dots \cup M_l$ .
-
(vi) The intersections satisfy $\dim M_i \cap M_j\le d-2$ for any $i\neq j$ .
Let
be the set of points of $\partial U_k$ for which the normal $n(x) = n(u)$ is well-defined. Since $\dim_H((\overline{U_k})^{\mathrm{o}}\setminus U_k) \leq d_k - 2$ , for all points for which the normal exists, the boundary separates $U_k$ and $\mathbb{R}^{d_k}\setminus U_k$ , and does not correspond to an ‘internal’ boundary of $U_k$ that is removed. By convention, we assume that n(x) is the outer normal.
Assumption 4. We make the following assumptions: for all $v\in \mathcal{V}$ , there is a refinement $\{W_1^v,...,W_m^v\}$ and $\{\Omega_1^v,...,\Omega_m^v\}$ of the boundary decomposition such that the following hold:
-
(i) This new decomposition satisfies the previous assumptions.
-
(ii) For all $0 < i \leq m$ , there exists $j \in \{1,\dots,l\}$ such that $\phi(W_i^v) \subset \phi(W_j)$ .
-
(iii) For all $x \in W_i^v$ , $y \in W_j$ such that $\phi(x) = \phi(y)$ , we have $n_i(x) = n_i(y)$ .
-
(iv) For each $0 < i \leq m$ , $\dim_H p_v(M_i^v)\leq n-2$ , where $M_i^v=\{\phi_i(x)\;:\;x\in \overline W_i^v \text{ and } \langle v, n_i(x)\rangle =0\}$ and $p_v$ is the orthogonal projection on $H_v=\mathrm{span}(v)^{\perp}$ .
-
(v) For all $ 0 < i \leq m$ and all $x\in\mathbb R^n$ , the sets $M_i^{v+}\cap(x+\mathbb R v)$ and $M_i^{v-}\cap(x+\mathbb R v)$ have at most one element, where $M_i^{v+}=\{x=\phi(y)\in M_i \;:\; \langle n_i(y), v \rangle >0\}$ and $M_i^{v-}=\{x=\phi(y)\in M_i\;:\; \langle n_i(y), v\rangle <0\}$ .
B.2. Proof of Theorem 2
We abuse notation and write $\partial_v f$ for the derivative at $t=0$ of $f(x+tv,v)$ with respect to t for a fixed v, which corresponds to the term $\Xi f$ .
B.2.1. Integrability.
We give two integrability lemmas that will be useful for the following proof.
Lemma 4. Let $f \in \mathcal{D}(A)$ . We know that
-
(i) f is bounded,
-
(ii) Af is bounded, and
-
(iii) and .
Proof. Items (i) and (ii) are immediate since $\mathcal D(A)\subset \mathcal{B}_0 \subset \mathcal B(E)$ , and $A\;:\;\mathcal D(A) \to \mathcal{B}_0\subset \mathcal B(E) $ .
Item (iii) follows from the fact that for $f\in \mathcal D(A)$ there is the Dynkin formula, which exactly implies $C_t^f$ is a true martingale ([11, (14.13)]), hence also a local martingale.
Lemma 5. We have $\lambda(k,x,v)(Q f(k,x,v) - f(k,x,v)) \in L_1(\mu)$ and $\partial_v f(k,x,v)\in L_1(\mu)$ .
Proof. Since Af is bounded, $Af \in L_1(\mu)$ . Since f is bounded and with (i) of Assumption 2, we have $\lambda(k,x,v)(Q f(k,x,v) - f(k,x,v)) \in L_1(\mu)$ . Hence $\partial_v f(k,x,v)\in L_1(\mu)$ .
This means that we can treat each term independently.
B.2.2. Integration of the infinitesimal generator over E.
Proposition 9. Let $f\in \mathcal{D}(A)$ and let $k \in K$ . For all $v\in \mathcal{V}^k$ we have
where $\sigma$ is the Lebesgue measure of the boundary (seen as a Riemannian manifold).
Proof. We would like to use an integration-by-parts result on the integral in question, which is precisely detailed in Appendix C.
So now let $v \in \mathcal{V}^k$ . Assumptions 3 and 4 imply that $U_k$ satisfies Assumption 6 of Appendix C. Furthermore, ; thus for all $x\in U_k$ , $t\rightarrow f(k,x+tv,v)$ is absolutely continuous. Finally, Lemma 5 implies that $\partial_v f(k,x,v) \in L_1(\mu)$ ; hence we can use Proposition 10 of Appendix C on $U_k$ with the function $z\mapsto f(z) \pi(z)$ . In this context, for $x \in \partial U_k$ , if $\langle n(x),v \rangle > 0$ ,
otherwise, if $\langle n(x),v \rangle > 0$ ,
This yields
where we have removed the absolute value around $\langle n(x), v \rangle$ to account for the sign difference of $\pi_k(x^-) f(k,x^-,v) - \pi_k(x^+) f(k,x^+,v)$ . Furthermore, $\partial U_k\setminus C_v \subset \partial U_k \cap N_k$ , and these two sets differ by a set of zero measure. Hence
which concludes the proof.
Lemma 6. We have
Proof. Since $\partial_v f$ is in $L_1(\mu)$ , $\partial_v f(k,x,v) \mu(k,x,v)$ is integrable. Hence by Fubini’s theorem,
Using Proposition 9,
Using (iii) from Assumption 2, $f(k,x,v) (\nabla \pi_k(x) \cdot v) p_k(v)$ is integrable, and $f(k,x,v) \mu(k,x,v)$ is bounded. Hence we can use Fubini a second time on both terms to get the result.
Appendix C. Theorem: integration by parts
Assumption 5. (Informal geometrical assumption.) Let U be an open set in $\mathbb R^n$ such that the following hold:
-
• $\bar{U} = \mathbb R^n$ ;
-
• the boundary $\partial U$ can be decomposed as a finite union of smooth closed sub-manifolds with piecewise $C^1$ boundaries in $\mathbb R^n$ ;
-
• for any $x,v \in \mathbb R^n$ , the intersection $\partial U \cup \{x+\mathbb R v\}$ is finite (not taking into account the points where v is tangent to $\partial U$ );
-
• $\dim_H(N_v) \leq n-2$ , where $N_v$ is the subset of $\partial U$ where the normal is ill-defined; and
-
• $\dim_H p_v(M^v)\leq n-2$ , where $M^v=\{x\in \partial U \textit{ such that } \langle v,n(x) \rangle=0\}$ and $p_v$ is the orthogonal projection on $H_v=v^{\perp}$ ,
with $\dim_H$ being the Hausdorff dimension.
These assumptions are made precise in Assumption 6 of the next section.
Proposition 10. Let U be an open set of $\mathbb{R}^n$ satisfying Assumption 6, and let $\partial U$ be its boundary. Let f and g be measurable functions from U to $\mathbb R$ such that the following hold:
-
(i) f is bounded;
-
(ii) for any sequence $(y_n)\subset U$ with $\|y_n\|\rightarrow\infty$ , $\lim_{n\to\infty}g(y_n)=0$ ;
-
(iii) for each $x,v\in \mathbb R^{n}$ , the functions $t\mapsto f(x+tv)$ and $t\mapsto g(x+tv)$ are absolutely continuous on $U\cap(x+\mathbb R v)$ and $\partial_t f,\partial_t g\in L^1(U)$ .
Fix $v\in \mathbb R^n$ . Then, using the convention
and
we have
where the second term is integrated with respect to the Lebesgue measure of the boundary (seen as a Riemannian manifold) and $N_v$ is the set of points where the normal is ill-defined.
Proof. The proof is a corollary of the next section.
C.1. Integration over open domain
Assumption 6. Let U be an open set in $\mathbb R^n$ such that for each $v\in S^{n-1}$ (the unit sphere in $\mathbb R^{n}$ ), the following hold:
-
1. There exist $W_1\subset\overline W_1\subset\Omega_1,\dots,W_k\subset\overline W_k\subset\Omega_k$ , 2k open sets in $\mathbb R^{n-1}$ , with $\dim_H\partial W_i\leq n-2$ (these open sets may depend on v because of Item 6 of this assumption).
-
2. There exist $\phi_i\;:\;\Omega_i\rightarrow\mathbb R^n$ , $i=1,\dots k$ , which are $C^1$ one-to-one maps such that the differential $D\phi_i(x)$ is one-to-one for all $x\in\Omega_i$ . This implies that there is a continuous normal $n_i\;:\;\Omega_i\rightarrow S^{n-1}$ .
-
3. We have $\partial U=M_1\cup\dots\cup M_k$ where the sets $M_i=\phi_i(\overline W_i)$ are closed.
-
4. We have $\dim_H M_i\cap M_j\leq n-2$ for all $i\neq j$ .
-
5. Let $W_i^0=\{x\in W_i \;:\; v\cdot n_i(x)=0\})$ . For each i, $\dim_H p_v(M_i^0)\leq n-2$ , where $M_i^0=\phi_i(W_i^0)$ and $p_v$ is the orthogonal projection on $H=H_v=v^{\perp}$ .
-
6. Let $W_i^+=\{x\in W_i\in M_i\;:\;n_i(x)\cdot v>0\}$ and $W_i^-=\{x\in W_i\;:\;n_i(x)\cdot v<0\}$ . For all i and all $y\in\mathbb R^n$ , the sets $M_i^{+}\cap(y+\mathbb R v)$ and $M_i^{-}\cap(y+\mathbb R v)$ have at most one element, where $M_i^{+}=\phi_i(W_i^+)$ and $M_i^{-}=\phi_i(W_i^-)$ .
Assumption 7. Let $f\;:\;U\rightarrow\mathbb R$ be a measurable function such that the following hold:
-
(i) For each $y,v\in \mathbb R^{n}$ , the function $t\mapsto f(y+tv)$ is absolutely continuous on every bounded interval I such that $y+I v \subset U$ .
-
(ii) We have $\lim_{\|y\|\rightarrow\infty}f(y)=0$ (that is, for any sequence $(y_n)\subset U$ with $\|y_n\| \to \infty$ ).
-
(iii) If U is not bounded, then for each $v\in \mathbb R^{n}$ , $\partial_v f \in L^1(U)$ .
We extend f to $\mathbb R^{n}\setminus \partial U$ with $f(y)=0$ for every $y\notin \overline U$ , so that we can suppose that $U=\mathbb R^n\setminus \partial U$ .
Theorem 4. Let U be an open set of $\mathbb R^n$ satisfying Assumption 6, for some fixed $v \in \mathbb R^n$ with $\| v\|=1$ . Let
be the set on which normals are ill-defined. Then for any f satisfying Assumption 7, the following hold:
-
1. We have $\dim_H \mathcal N_v\leq 2$ and $\dim_H p_v(\mathcal N_v)\leq n-2$ .
-
2. For each $y\in\partial U\setminus \mathcal N_v$ , the limits
\begin{align*} \lim_{t\downarrow 0}f(y+tv)=f(y^+) \text{ and } \lim_{t\uparrow 0}f(y+tv)=f(y^-) \end{align*}exist. -
3. The normal n(y) is well-defined at each point $y\in\partial U\setminus \mathcal N_v$ , and
\begin{align*} \int_U\partial_v f(y)\;\textrm{d} y=\int_{\partial U\setminus \mathcal N_v}(f(y^-)-f(y^+))\,|n(y)\cdot v|\;\textrm{d}\sigma(y), \end{align*}where $\sigma$ is the Lebesgue measure on $\partial U$ .
We can use the theorem with a product $f=g\pi$ where
-
• $g\;:\;U\rightarrow\mathbb R$ is measurable, bounded, and absolutely continuous on each set $U\cap(y+\mathbb R v)$ , $y\in\mathbb R^n$ , and $\partial_t g\in L^1(U)$ ; and
-
• $\pi\;:\;U\rightarrow\mathbb R$ is in $C^1(U)\cap L^1(U)$ and is bounded with bounded derivatives, $\lim_{\|y\|\rightarrow\infty}\pi(y)=0$ , and the derivative $\partial_v\pi\in L^1(U)$ .
C.2. Proof of Theorem 4
For the first point, let
By Assumptions 6.1, 6.4, and 6.5, $\dim_H\!(p_v(\mathcal N_v))\leq n-2$ ; therefore $p_v(\mathcal N_v)$ has zero H-Lebesgue measure.
For the second point, the fact that $f(y+\!)$ and $f(y-\!)$ exist is a direct consequence of Assumption 7(i).
Finally we consider the third point. For all $z\in H$ , define
By Assumption 6.6, the set E(z) has 2k elements at most for all $z\in V$ . Set
Since $\dim_H(p_v(\mathcal N_v))\leq n-2$ , it follows that $\dim_H \mathcal N\leq n-1$ . Therefore,
By Fubini’s theorem,
By Assumption 7, for almost all $z\in V$ and for each connected component (a, b) of $\mathbb R\setminus E(z)$ ,
Taking into account that $\lim_{t\rightarrow\pm\infty}f(z+tv)=0$ , we obtain
Now we want to see that the latter integral is equal to
Set
By Assumption 6.6, for each $(i,s)\in I$ , the map $p_v\circ \phi_i\;:\;W_i^s\rightarrow V_i^s$ is a bijection, so that we can define the map $F_i^s\;:\;V_i^s\rightarrow M_i^s$ by $F_i^s(z)=\phi_i((p_v\circ \phi_i)^{-1}(z))$ . Using the definition of the set $V=H\setminus\mathcal N$ , we see that for each $z\in V$ and each $t\in E(z)$ , there exists $(i,s)\in I$ unique such that $z+tv=F_i^s(z)\in M_i^s$ . Furthermore, for each $(i,s)\in I$ , $V_i^s\cap V=\cup_{J\ni (i,s)}V_J$ . It follows that
Since the differential of each $\phi_i$ is always one-to-one and since $D\phi_i(x)(u).u$ is never orthogonal to v for $x\in W_i^s $ and $u\neq 0$ , the local inverse function theorem implies that the maps $F_i^s$ are $C^1$ . Furthermore, the image of $F_i^s$ is $M_i^s$ and the normal to $M_i^s$ at $y=F_i^s(z)$ is $n(y)=\pm n_i((p_v\circ \phi)^{-1}(z))$ . Therefore,
Finally, since $\partial U\setminus \mathcal N_v=\cup_{(i,s)\in I}(M_i^s\setminus \mathcal N_v)$ ,
Appendix D. Validity of transition kernels derived from limiting behaviour
D.1. Bouncy particle sampler: proof of Proposition 5
For the specified $Q_{\mathrm{BPS}}$ , we first derive the form of the associated probability kernel on velocities, $Q'_{\!\!x}$ , remembering that $Q'_{\!\!x}$ is obtained by flipping the velocity and then applying the transition defined by $Q_{\mathrm{BPS}}$ . This gives
The transition kernel $Q'_{\!\!x}$ allows for two possible transitions, either $v^{\prime}=- v$ or $v^{\prime}=v - 2 \langle n,v\rangle n$ . These transitions have the following properties:
-
(i) In the first case, if $v \in\mathcal{V}_x^+$ then $v^{\prime} \in\mathcal{V}_x^-$ , and vice versa. In the second case, if $v\in \mathcal{V}_x^+$ then $v^{\prime}\in \mathcal{V}_x^+$ .
-
(ii) For either transition, $\langle v^{\prime},v^{\prime}\rangle=\langle v,v \rangle$ , and $|\langle n,v^{\prime}\rangle|=|\langle n,v\rangle|$ . Furthermore, by the spherical symmetry of p(v) for the bouncy particle sampler, the first of these means that $p(v) = p(v^{\prime})$ .
We need to show that $l_x(dv^{\prime})=\int Q'_{\!\!x}(dv^{\prime}|v)l_x(v) \;\textrm{d} v$ , where
We will show this holds first for $v^{\prime}\in \mathcal{V}_x^+$ and then for $v^{\prime}\in \mathcal{V}_x^-$ .
If $v^{\prime}\in \mathcal{V}_x^+$ then there are two possible transitions, from $v=-v^{\prime} \in \mathcal{V}_x^-$ and from $v\in \mathcal{V}_x^+$ where $v=-v'+2\langle v',n \rangle n$ . Let $v^*=-v'+2\langle v',n \rangle n$ . Since the Jacobian of both transformations is 1,
where the last equality comes from applying Property (ii) of the transition. The last expression simplifies to $l_x(\textrm{d} v^{\prime})$ as required.
For $v^{\prime}\in \mathcal{V}_x^-$ we have only one transition and thus
which, again using Property (ii), is $l_x(v^{\prime})$ .
D.2. Coordinate sampler: proof of Proposition 6
We follow a similar argument to that of the previous section. First we write down the form of $Q'_{\!\!x}$ derived from $Q_{\mathrm{CS}}$ :
where $K=\sum_{v\in\mathcal{V}_x^+} |\langle n,v \rangle|$ .
For the coordinate sampler, $p(v)=1/(2d)$ for each of the possible values for v. Now we need to show $l_x(v^{\prime})=\sum Q'_{\!\!x}(v^{\prime}|v)l_x(v)$ . We will consider the cases $v^{\prime}\in \mathcal{V}_x^+$ and $v^{\prime}\in\mathcal{V}_x^-$ separately. For the latter case, the argument is the same as for the bouncy particle sampler. Thus we just present the case for $v^{\prime}\in \mathcal{V}_x^+$ .
We have
The second equality comes from the fact that p(v) is constant for all $v\in\mathcal{V}$ . The third equality comes from the definition of K.
D.3. Zigzag sampler: proof of Proposition 7
As discussed, the kernel for the velocity does not depend on k. Without loss of generality, we can set $k=1$ for implementing the algorithm that defines $Q_{\mathrm{ZZ}}$ . We need to show that $Q'_{\!\!x}$ keeps $l_x$ invariant. We will prove this by showing that the following stronger detailed balance condition holds:
As $Q'_{\!\!x}(v^{\prime}|v)=Q_{\mathrm{ZZ}}(v^{\prime}|-v)$ , writing the detailed balance condition for pairs $-v$ and v ′, we have that it suffices to show
By a slight abuse of notation, let $k(v)=k_1(x)$ if $v\in\mathcal{V}^-_x$ and $k(v)=k_2(x)$ if $v\in\mathcal{V}^+_x$ . Then we can write $l_x(v)=|\langle n,v \rangle|p(v)\pi_{k(v)}(x)$ . Thus, using the fact that p(v) defines a uniform distribution on $\mathcal{V}$ , we have that the detailed balance condition simplifies to
This can be viewed as matching the probability that we have a velocity v and transition to v ′ with one where we flip the velocities, starting at $-v^{\prime}$ and transitioning to $-v$ .
We show that the detailed balance condition holds separately for different combinations of whether $v\in\mathcal{V}_x^+$ or $v\in\mathcal{V}_x^-$ and whether $v^{\prime}\in\mathcal{V}_x^+$ or $v^{\prime}\in\mathcal{V}_x^-$ .
First assume $v\in\mathcal{V}_x^+$ and $v^{\prime}\in\mathcal{V}_x^-$ . This corresponds to a trajectory that is moving from the lower- to the higher-density region, but that reflects off the boundary and stays in the lower-density region. It is straightforward to see that the events that change the velocity only increase $\langle n,v \rangle$ , the speed at which the trajectory moves through the boundary region to the higher-density region. Thus a transition from $\mathcal{V}_x^+$ to $\mathcal{V}_x^-$ is impossible, and $Q_{\mathrm{ZZ}}(v^{\prime}|v)=0$ . Similarly, $-v^{\prime}\in\mathcal{V}_x^+$ and $-v\in\mathcal{V}_x^-$ so $Q_{\mathrm{ZZ}}(\!-\!v|-v^{\prime})=0$ . Hence the detailed balance conditions trivially hold in this case.
Next assume $v\in\mathcal{V}_x^-$ and $v^{\prime}\in\mathcal{V}_x^+$ . This corresponds to a trajectory that is moving from the higher- to the lower-density region, but that reflects off the boundary and stays in the higher-density region. In this case $k(\!-\!v)=k(v^{\prime})$ and thus the detailed balance condition becomes
To prove that the detailed balance condition holds, we will first obtain an expression for $Q_{\mathrm{ZZ}}(v^{\prime}|v)$ , and then introduce a coupling between a transition for v to v ′ and one from $-v^{\prime}$ to $-v$ to link it to a similar expression for $Q_{\mathrm{ZZ}}(\!-\!v|-v^{\prime})$ .
The randomness in the algorithm that defines $Q_{\mathrm{ZZ}}$ only comes through the randomness of the event times simulated in Step (a) of Section 6.2.3. Remember that $\tau_i$ is the time at which component i of the velocity will switch, if the trajectory is still within the boundary region. Each $\tau_i$ is (conditionally) independent of the others, and has an exponential distribution with rate $\max\{0, -n_iv_i\}$ , where $n_i$ is the component of the ith coordinate of the unit normal n. If $n_iv_i\geq0$ , then $\tau_i=\infty$ .
It is helpful to introduce three sets of components:
-
• Let $\mathcal{S}_1$ be the set of components i such that $v^{\prime}_i=v_i$ and $n_iv_i<0$ .
-
• Let $\mathcal{S}_2$ be the set of components i such that $v^{\prime}_i=-v_i$ .
-
• Let $\mathcal{S}_3$ be the set of components i such that $v^{\prime}_i=v_i$ and $n_iv_i\geq0$ .
So $\mathcal{S}_1$ is the set of components of the velocity v that are moving the particle towards the low-density region, and are unchanged by the transition to v ′; $\mathcal{S}_2$ is the set of components that flip during the transition from v to v ′; and $\mathcal{S}_3$ is the set of components of the velocity v that are moving the particle towards the high-density region, and are unchanged by the transition to v ′.
Only components i of the velocity for which $n_iv_i<0$ can change during the transition from v to v ′. This means that if there exists $i\in\mathcal{S}_2$ such that $n_iv_i \geq 0$ , then the transition from v to v ′ is impossible. By the same argument, the transition from $-v^{\prime}$ to $-v$ is impossible. Thus in this case $Q_{\mathrm{ZZ}}(v^{\prime}|v)=Q_{\mathrm{ZZ}}(\!-\!v|-v^{\prime})=0$ and detailed balance trivially holds. So in the following we will assume that $n_iv_i<0$ for $i \in\mathcal{S}_2$ .
By a similar argument we have that the set $\mathcal{S}_1$ is the set of indices of the velocity that could have changed during the transition from v to v ′, but did not, whereas $\mathcal{S}_3$ is the set of indices of the velocity that could never have changed during the transition.
To ease notation, let $m=|\mathcal{S}_2|$ , the number of indices in the set $\mathcal{S}_2$ , and note that $m\geq 1$ as $v\neq v^{\prime}$ . Without loss of generality we can relabel the coordinates so that $\mathcal{S}_2=\{1,\ldots,m\}$ , and we will use $\tau_{1:m}$ to denote the vector of event times for the coordinates in $\mathcal{S}_2$ .
We now introduce a function of time, t, that depends on $\tau_{1:n}$ . This is
This can be viewed as the net distance travelled by the trajectory up to time t in the direction of the normal n, given that only velocity coordinates in $\mathcal{S}_2$ can change, and these change at times $\tau_{1:m}$ . This function is important as it determines when the trajectory leaves the boundary region, and determines the termination of the simulation algorithm in Step (b). As $v\in\mathcal{V}_x^-$ and $v^{\prime}\in\mathcal{V}_x^+$ , and the changes in velocity in the direction of n are monotone as we flip components, we have that $h(t;\;\tau_{1:n})$ is strictly decreasing at $t=0$ , is strictly increasing for large enough t, and is unimodal. As $h(0;\tau_{1:m})=0$ , this means that there is a unique $t^*(\tau_{1:m})>0$ such that $h(t^*(\tau_{1:m});\tau_{1:m})=0$ . This is the exit time from the boundary region calculated in Step (b) of the algorithm.
We can now define the set $\mathcal{T}$ of values of $\tau_{1:m}$ that are consistent with a transition from v to v ′. The conditions are that all components of the velocity must flip before $t^*$ , and that the trajectory must not pass through the boundary region—see the other stopping criteria in Step (b) of the algorithm. This gives us that
The probability of a transition from v to v ′ is thus the probability that $\tau_{1:m}\in\mathcal{T}$ times the probability that $\tau_i>t^*(\tau_{1:m})$ for $i\in\mathcal{S}_1$ . As each $\tau_i$ , $i \in \mathcal{S}_1$ or $i\in\mathcal{S}_2$ , has an independent exponential distribution with rate $-n_iv_i$ ,
Now consider the reverse transition, from $-v^{\prime}$ to $-v$ . Under our existing definitions of $\mathcal{S}_1$ , $\mathcal{S}_2$ , and $\mathcal{S}_3$ , we have that $\mathcal{S}_2$ is still the set of indices that the flip for the transition from $-v^{\prime}$ to $-v$ , but now $\mathcal{S}_1$ is the set of components of the velocity that could never have flipped, while $\mathcal{S}_3$ is the set of components that could have flipped but did not.
We can define the same quantities for the reverse transition from $-v^{\prime}$ to $-v$ . We will use tildes to denote quantities that relate to this transition. So $\tilde{\tau}_{1:m}$ will be the vector of flip times for components in $i\in\mathcal{S}_2$ . We have
using the fact that $-v^{\prime}_i=v_i$ for $i \in \mathcal{S}_2$ and $-v^{\prime}_i=-v_i$ otherwise. By the same argument as above, there is a unique $\tilde{t}^*(\tilde{\tau}_{1:m})>0$ such that $\tilde{h}(\tilde{t}^*(\tilde{\tau}_{1:m});\tilde{\tau}_{1:m})=0$ . The set of possible values of $\tilde{\tau}_{1:m}$ that are consistent with the transition from $-v^{\prime}$ to v is
Finally we can write down the transition probability as before, remembering that the rate of flipping for components $i\in\mathcal{S}_2$ is $-n_iv_i$ as before, but for $i\in\mathcal{S}_3$ it is $n_iv_i$ . Thus
To relate the two transition probabilities, we introduce a coupling between $\tau_{1:m}$ and $\tilde{\tau}_{1:m}$ , so $\tilde{\tau}_{1:m}=g(\tau_{1:m})$ , where
This coupling is a natural one. If we consider the path through the boundary region given by $\tau_{1:m}$ that transitions from v to v ′, we can reverse that path to get a path that transitions from $-v^{\prime}$ to $-v$ . For the forward path a flip of component i at time $\tau_i$ occurs at a time $t^*(\tau_{1:m})-\tau_i$ prior to the end of the path. Thus for the reverse path the flip would occur at time $t^*(\tau_{1:m})-\tau_i$ .
It is straightforward to show that if $\tilde{\tau}_{1:m}=g(\tau_{1:m})$ then $h(t;\;\tau_{1:m})=\tilde{h}(t^*(\tau_{1:m})-t;\;\tilde{\tau}_{1:m})$ . This result is intuitive; it is saying the distance of the forward trajectory within the boundary region at time t is equal to the distance of the backward trajectory within the boundary region at time $t^*(\tau_{1:m})-t$ . This immediately implies that $t^*(\tau_{1:m})=\tilde{t}^*(\tilde{\tau}_{1:m})$ : the exit times for the forward and backward trajectories are the same. Furthermore, if we consider the second constraint on $\tau_{1:m}$ in the definition of $\mathcal{T}$ , then we have
for $\tilde{\tau}_{1:m}=g(\tau_{1:m})$ . Combining this with the fact that $\tau_i\leq t^*(\tau_{1:m})$ , we have $\tilde{\tau}_{1:m}\leq \tilde{t}^*(\tilde{\tau}_{1:m})$ . We have that the function g maps $\tau_{1:m}\in\mathcal{T}$ to $\tilde{\tau}_{1:m}\in\tilde{\mathcal{T}}$ . Furthermore, the function g is invertible, and by similar arguments we have that $g^{-1}$ maps $\tilde{\tau}_{1:m}\in\tilde{\mathcal{T}}$ to $\tau_{1:m}\in\mathcal{T}$ . Hence g is a bijection from $\mathcal{T}$ to $\tilde{\mathcal{T}}$ .
The function g defines a linear map between $\tau_{1:m}$ and $\tilde{\tau}_{1:m}$ . For $\tau_{1:m}\in\mathcal{T}$ we have that, by definition of $t^*(\tau_{1:m})$ ,
This gives that
Furthermore, using that v ′ is equal to v except that $v_i$ is flipped for $i=1,\ldots,m$ , we have $K=\langle v^{\prime},n\rangle $ .
Let $b_{1:m}$ be the $1\times m$ vector whose ith entry is $b_i=2v_in_i/K$ . If we let $1_m$ denote the $1\times m$ vector of ones, and $I_m$ the $m\times m$ identity matrix, then we have
where the $m \times m$ matrix $A=(b_{1:m}1_{1:m}^\top-I_m)$ . In the following argument we will make the change of variables $\tilde{\tau}_{1:m}=g(\tau_{1:m})=A\tau_{1:m}$ , and we will need the determinant of the Jacobian of this transformation. Using the matrix determinant lemma, this is given by
This simplifies to
So now, taking the definition of $Q_{\mathrm{ZZ}}(v^{\prime}|v)$ and applying the change of variables $\tilde{\tau}_{1:m}=g(\tau_{1:m})$ , we get
where the final equality comes from the definition of $\tilde{t}^*(\tilde{\tau}_{1:m})=t^*(\tau_{1:m})$ , using (13) after substituting in $\tau_i=\tilde{t}^*(\tilde{\tau}_{1:m})-\tilde{\tau}_i$ .
By comparing the final expression with (12), we get that
which satisfies (11) as required.
The final combination involves $v,v^{\prime}\in\mathcal{V}_x^+$ and $-v^{\prime},-v\in\mathcal{V}_x^-$ , or vice versa. The detailed balance condition in this case becomes
We can show this using an argument similar to the one above, with the same coupling of paths from v to v ′ with paths from v ′ to v. The main differences are as follows. First, the definition of $\mathcal{T}$ is simplified to
as, by monotonicity of the changes in velocity, we do not need to check whether the other exit condition in Step (b) holds. Second, the definition of $t^*(\tau_{1:m})$ changes, with it being the value of t for which $h(t;\;\tau_{1:m})=C$ . For $\tau_{1:m}\in\mathcal{T}$ , this becomes
because of the different exit condition in Step (b). We have similar changes to the definitions of $\tilde{\mathcal{T}}$ and $\tilde{t}^*(\tilde{\tau}_{1:m})$ .
However, we can define $Q_{\mathrm{ZZ}}(v^{\prime}|v)$ and $Q_{\mathrm{ZZ}}(\!-\!v|-v^{\prime})$ in a similar way. Furthermore we can use the same linear transformation g, which is still a bijection between $\mathcal{T}$ and $\tilde{\mathcal{T}}$ . Whilst the definition $t^*$ has changed, this only introduces an additive constant into the linear transformation defined by g, and thus the Jacobian of the transformation is unchanged. Following the argument above, we thus get to the same expression for $Q_{\mathrm{ZZ}}(v^{\prime}|v)$ after making the change of variables:
Now substituting in our new definition of $\tilde{t}^*(\tilde{\tau}_{1:m})=t^*(\tau_{1:m})$ we get
where the additional factor of $\exp\{C\}$ is due to the different definition of $t^*$ . As $C=\log(\pi_{k_2(x)}(x)/\pi_{k_1(x)}(x) )$ this rearranges to
as required.
D.4. Bouncy particle sampler with reflect and refract: proof of Proposition 8
We first write down the form of $Q_x'$ derived from $Q_{\text{RR}}$ . To do this it is helpful to define functions that define the new velocity at a reflection or refraction. Remember that $\Delta U$ is the change in log density as we move from the region with lower density, $k_1$ , to the region with higher density, $k_2$ , and is strictly positive. We will define the reflection of a velocity v by $L(v)=v - 2 \langle v,n\rangle n$ . For refractions we need to distinguish between transitions from $k_1$ to $k_2$ and vice versa. The refraction for a transition from $k_1$ to $k_2$ is $R_1(v) = \sqrt{\langle v,n\rangle^2 + 2 \Delta U}n + v - \langle v,n\rangle n$ , and for $k_2$ to $k_1$ it is $R_2(v)= \sqrt{\langle v,n\rangle^2 - 2 \Delta U}n + v - \langle v,n\rangle n$ , with $R_2$ being defined only for v such that $\langle v,n\rangle^2 > 2 \Delta U $ .
Using these functions, we can write the transition kernel $Q_x'$ as
We need to show that $l_x(dv^{\prime}) = \int Q'_{\!\!x}(dv^{\prime}|v) l_x(v) dv$ with
Since, for each v ′, there is only a finite number of v such that $Q'_{\!\!x}(v^{\prime}|v) > 0$ , the former is equivalent to $l_x(v^{\prime})=\sum Q'_{\!\!x}(v^{\prime}|v)l_x(v) J_{v,v^{\prime}}$ , where $J_{v,v^{\prime}}$ is the Jacobian of the deterministic transformation from v to v ′.
If $v^{\prime}\in\mathcal{V}_x^+$ and $\langle v^{\prime},n\rangle^2 \leq 2 \Delta U$ , then the only v for which $Q'_{\!\!x}(v^{\prime}|v)$ is non-zero is $v=L(\!-\!v^{\prime})$ . The result trivially holds as this transition has unit Jacobian, $\|v\|=\|v^{\prime}\|$ , so $p(v)=p(v^{\prime})$ ; $|\langle n,v \rangle|= |\langle n,v^{\prime} \rangle|$ ; $v^{\prime}\in\mathcal{V}_x^+$ and $Q'_{\!\!x}(v^{\prime}|v)=1$ .
If $v^{\prime}\in\mathcal{V}_x^+$ and $\langle v,n\rangle^2 > 2 \Delta U$ , then the only v for which $Q'_{\!\!x}(v^{\prime}|v)$ is non-zero is for v such that $v^{\prime}=R_1(v)$ , which is equivalent to $v=R_2(\!-\!v^{\prime})\in\mathcal{V}_x^-$ . So as $p(v)=C_1 \exp\{-\frac{1}{2}\|v\|^2\}$ for some constant $C_1$ ,
Now
where the third equality comes from n being orthogonal to $v^{\prime} - \langle v^{\prime},n\rangle n$ . This gives
The definition of $\Delta U$ gives that $\exp\{\Delta U\} \pi_{k_1}(x)=\pi_{k_2}(x)$ .
Now $v^{\prime}=\sqrt{\langle v,n\rangle^2 + 2 \Delta U}n - v + \langle v,n\rangle n$ . Up to a sign change, this only changes the velocity in the direction of n. Thus the Jacobian of the transformation is given by
Thus, substituting in these expressions, we have
as required.
An equivalent calculation holds for $v^{\prime}\in\mathcal{V}_x^{-}$ , which is omitted for brevity.
Appendix E. Comments on reference [Reference Bierkens4]
Equation (4) of [Reference Bierkens4] should read $Q_b (x, u, \;\textrm{d} v)\rho(\textrm{d} u) = Q_b (x, -v, -\;\textrm{d} u) \rho(\!-\!\;\textrm{d} v)$ , $x \in \partial O$ , for u in the exit boundary and v in the entrance boundary. The sketch of the proof given in the appendix of [Reference Bierkens4] also requires a correction. The condition (S2) on the domain of the generator has to be modified: it only holds for v in the exit boundary, and not for every v. This mistake, combined with the sign error in Equation (4), is what allows [Reference Bierkens4] to reach the conclusion that $\int \mathcal{L} f \;\textrm{d} \pi \;\textrm{d}\rho = 0$ , missing the fact that the boundary term for the exit velocities cancels out the one for the entrance velocities. A full proof is provided in this manuscript, with proper care for the entrance/exit velocities.
Appendix F. Additional simulation results
For $\alpha_{\mathrm{out}} = 0$ , Figures 4–6 show trajectories for the bouncy particle sampler, the coordinate sampler, and the zigzag process for dimensions $d=2$ , 10, 100 for the sampling from a Gaussian restricted to a cube. Figure 7 shows trajectories for the zigzag process if we use the canonical basis—in this case the distributions of all coordinates are independent, and the zigzag process benefits from this by being able to run independent dynamics for each coordinates.
For $\alpha_\mathrm{out} > 0$ , Figures 8 and 9 show trajectories and Monte Carlo estimates along trajectories in dimension 20, in the case where the canonical basis is not rotated. In this case, the zigzag sampler clearly performs the best. Figure 10 shows trajectories when the basis is rotated.
Acknowledgements
This work was supported by EPSRC grants EP/R018561/1 and EP/R034710/1. A. C. did most of this work while at the University of Lancaster, and A. Q. W. did most of this work while at the University of Bristol.
Funding information
There are no funding bodies to thank in relation to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.