1. Introduction
Drawing random samples from a convex body in $\mathcal{K} \subset \mathbb{R}^n$ is an important problem for volume computation and optimization which has generated a large body of research. Usually $\mathcal{K}$ is specified by a membership oracle which certifies whether or not a test point $x \in \mathbb{R}^n$ is contained in $\mathcal{K}$ . Given such an oracle, geometric random walks are then used to explore $\mathcal{K}$ in such a way that, after a sufficient number of steps, the walk has ‘mixed’, in the sense that the current point is suitably close to a point uniformly drawn from $\mathcal{K}$ in terms of statistical distance. To use such walks, an assumption that $\mathcal{B}(r) \subset \mathcal{K} \subset \mathcal{B}(R)$ is often made, where $\mathcal{B}(r)$ represents the Euclidean ball of radius $r > 0$ . One common example of such geometric walks is the ball walk, which generates the next point by uniformly randomly sampling from a ball of radius $\delta \leqslant r/\sqrt{n}$ centered at the current point, and mixes in ${\widetilde{O}}\big(n\big(R^2/\delta^2\big)\big)$ steps from an M-warm start (i.e., the starting distribution has a density bounded above by a constant M [Reference Kannan, Lovász and Simonovits5]). Here and elsewhere, the ${\widetilde{O}}({\cdot})$ notation suppresses polylogarithmic factors as well as constants depending only on the error parameters. Another example is the hit-and-run walk, where the next point is chosen uniformly at random from a random chord in $\mathcal{K}$ which intersects the current point. Hit-and-run mixes in $O\big(n^3(R/r)^2 \log\!(R/(d\epsilon))\big)$ , where the starting point is at distance d from the boundary and $\epsilon$ is the desired distance to stationarity [Reference Lovász and Vempala11]. Affine-invariant walks (i.e., geometric walks whose mixing time is invariant to such affine transformations) are another class of random walks which avoid the problem of rounding. One such random walk is known as the Dikin walk [Reference Kannan and Narayanan6], which uses uniform sampling from Dikin ellipsoids to make steps. Given a polytope with m inequality constraints, the Dikin walk mixes in ${\widetilde{O}}(mn)$ steps from a warm start. This random walk was extended to general convex bodies equipped with a $\nu$ -self-concordant barrier in [Reference Narayanan13], and mixes in ${\widetilde{O}}\big(n^3\nu^2\big)$ steps from a warm start. For the case of a polytope, this implies that the Dikin walk equipped with the Lee–Sidford (LS) barrier [Reference Lee and Sidford8] mixes in ${\widetilde{O}}\big(n^5\big)$ steps from a warm start, though at each step one must additionally compute the LS barrier, which requires $O\big(nnz(A) + n^2\big)$ arithmetic operations, where nnz(A) is the number of non-zero entries in the matrix A which defines the polytope. A significantly improved analysis of this walk was performed by [Reference Chen, Dwivedi, Wainwright and Yu2], whose algorithm reaches a total variation distance of $\epsilon$ from the uniform measure in $O\!\left(n^{2.5}\log^4\!\left(\frac{2m}{n}\right)\log\!\left(\frac{M}{\epsilon}\right)\right)$ steps from an M-warm start. Very recently, it was shown in [Reference Laddha, Lee and Vempala7] that for certain ‘strongly self-concordant’ barriers there is a Dikin walk that mixes in $\tilde{O}(n \tilde{\nu})$ , where $\tilde{\nu}$ is related to the self-concordance parameter of the barrier.
In this paper we introduce another affine-invariant random walk akin to the Dikin walk which uses uniform sampling from John’s ellipsoids of a certain small radius of appropriately symmetrized convex sets to make steps. We show that this walk mixes to achieve a total variation distance $\epsilon$ in $O\big(n^7\log \epsilon^{-1}\big)$ steps from a warm start. The type of convex body $\mathcal{K}$ is not specified (i.e., need not be a polytope) in our analysis of the mixing time, but one must have access to the John’s ellipsoid of the current symmetrization of the convex body. While this dependence on the dimension is admittedly steep, a significant feature of this walk is that its mixing time from a warm start, or alternatively a ‘central point’ such as the center of mass, can be bounded above by a quantity that has absolutely no dependence on any parameter associated with the body apart from its dimension.
Notation: We will denote a large universal constant by C and a small universal constant by c. Let $\pi_0$ be a measure that is absolutely continuous with respect to the uniform measure $\pi$ on $\mathcal{K}$ . Let $\pi_t$ denote the measure after t steps of the Markov chain.
Our main theorems at the end of this paper are the following.
Theorem 1: Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. Let $\epsilon > 0$ and $M =\sup \frac{\pi_0(A)}{\pi(A)}$ . After $t(\epsilon) = C n^7 \log\!({M}/\epsilon) $ steps of John’s walk, we have $d_{TV}\big(\pi_{t(\epsilon)}, \pi\big) \leqslant \epsilon$ .
Theorem 2: Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then there is a random geometrically distributed time $\tau$ with mean bounded above by C such that for $\epsilon > 0$ , after $t(\epsilon) + \tau = C n^7 \left(n \log\!\left(\sqrt{n}/(r\eta)\right) + \log\!({1}/\epsilon)\right) + \tau$ steps of John’s walk starting at x, we have $d_{TV}\big(\pi_{t(\epsilon)+ \tau}, \pi\big) \leqslant \epsilon$ .
It is known that for the center of mass, $\eta \geqslant \frac{c}{n}.$
2. John’s walk
In this section, we describe John’s maximum-volume ellipsoid for a convex body $\mathcal{K} \subset \mathbb{R}^n$ , and describe a geometric random walk using such ellipsoids. We begin by reviewing John’s theorem and some implications of the theorem.
2.1. John’s theorem
Fritz John showed that any convex body contains a unique ellipsoid of maximal volume, and characterized the ellipsoid [Reference John4, Reference Ball1]. Without loss of generality, we may assume that the ellipsoid of maximal volume is the unit Euclidean ball $\mathcal{B} \subset \mathbb{R}^n$ , since this is the case after an affine transformation. John’s theorem as stated for the unit ball case is as follows.
Theorem 1. (John’s theorem.) Each convex body $\mathcal{K} \subset \mathbb{R}^n$ contains a unique ellipsoid of maximal volume. The ellipsoid is $\mathcal{B}$ if and only if the following conditions are satisfied: $\mathcal{B} \subset \mathcal{K}$ , and for some $m \geqslant n$ there are Euclidean unit vectors $\left\{{u_i}\right\}_{i=1}^m$ on the boundary of $\mathcal{K}$ and positive constants $\left\{{c_i}\right\}_{i=1}^m$ satisfying
where $I_n$ denotes the identity matrix in $\mathbb{R}^{n\times n}$ .
Note that the condition (2.2) is sometimes written equivalently as
for all $x, y \in \mathbb{R}^n$ . Using the cyclic invariance of the trace and that the $\left\{{u_i}\right\}$ are unit vectors, (2.2) implies that
a property we employ in the subsequent analysis.
We now enumerate some properties from [Reference Ball1] which provide additional insight into the geometric properties of John’s ellipsoids and are useful for the analysis in subsequent sections. Note that the condition (2.1) implies that all the contact points do not lie in one half-space of the unit ball, and this condition is redundant in the symmetric case, since for every contact point $u_i$ , its reflection about the origin $-u_i$ is also a contact point. The condition (2.2) guarantees that such contact points do not lie close to a proper subspace. Furthermore, there are at most $n(n+3)/2$ contact points for general $\mathcal{K}$ , and $n(n+1)/2$ non-redundant contact points if $\mathcal{K}$ is origin-symmetric [Reference Gruber3]. At each $u_i$ , the supporting hyperplane to $\mathcal{K}$ is unique and orthogonal to $u_i$ , since this is the case for the unit ball. Thus, considering the polytope resulting from such supporting hyperplanes, $\mathcal{P} = \left\{{x \in \mathbb{R}^n {\;\big|\;} \langle x, u_i \rangle \leqslant 1, \, i = 1, \ldots, m}\right\}$ , the convex set $\mathcal{K}$ obeys the sandwiching $\mathcal{B} \subset \mathcal{K} \subset \mathcal{P}$ . By Cauchy–Schwarz, for any $x \in \mathcal{P}$ , we have
Since the weights $\left\{{c_i}\right\}$ are positive, it follows from applying the conditions (2.1), (2.2), and (2.3) that
from which it follows that $|x| \leqslant n$ . If the convex body is origin-symmetric, then by substituting $-u_i$ for $u_i$ , for any $x \in \mathcal{P}$ , we have
It follows that
so $|x| \leqslant \sqrt{n}$ . Therefore, if $\mathcal{K}$ is origin-symmetric and the unit ball is the John’s ellipsoid, then the containment is
2.2. The John’s walk algorithm
We state the algorithm for a general convex body $\mathcal{K}$ . At a given point $x \in \mathcal{K}$ , let the symmetrization of $\mathcal{K}$ at x be
and for some symmetric positive definite matrix $E_x$ , let
denote the John’s ellipsoid of $\mathcal{K}_x^s$ . Similarly, let the rescaled John’s ellipsoid be $\mathcal{E}_x(r) = \left\{{r (E_x u) + x {\;\big|\;} |u| \leqslant 1}\right\}$ , where the radius $r > 0$ will be specified in Section 3. Assume $0 = x_0 \in \text{int}(\mathcal{K})$ , and we have computed $\mathcal{E}_{x_0}$ . To generate a sample $x_i$ given $x_{i-1}$ , we use Algorithm 2.1, where $\lambda({\cdot})$ denotes the Lebesgue measure on $\mathbb{R}^n$ .
Algorithm 2.1 is a Metropolis–Hastings geometric random walk which uses the uniform measure $Q_x({\cdot})$ on the dilated John’s ellipsoid $\mathcal{E}_x(r)$ as the proposal distribution. Tossing a fair coin ensures that the transition probability kernel defined by the algorithm is positive definite, which is known as making the walk lazy. Lazy random walks have the same stationary distribution as the original walk, at the cost of a constant increase in mixing time (we will analyze the non-lazy walk, noting that the mixing time is not affected in terms of complexity as a function of m and n). The rejection of any sample y such that $x \notin \mathcal{E}_y(r)$ is necessary to ensure that the random walk is reversible.
The uniform measure on the John’s ellipsoid $\mathcal{E}_x(r)$ is absolutely continuous with respect to the Lebesgue measure $\lambda$ , and thus the Radon–Nikodym derivative (i.e., density) for the proposal distribution is
The acceptance probability corresponding to the uniform stationary measure in the Metropolis filter is
By the Lebesgue decomposition, the transition probability measure $P_x({\cdot})$ of the non-lazy version of Algorithm 2.1 is absolutely continuous with respect to the measure
where $\delta_x({\cdot})$ is the Dirac measure at x corresponding to a rejected move. The transition density is thus
where $1_{\left\{{\cdot}\right\}}$ is the indicator function and the rejection probability is denoted by $\rho(x)$ . Following the observation below, we analyze the mixing time of the walk.
2.2.1. An observation
As a consequence of John’s theorem, $\mathcal{E}_x = \mathcal{B}$ for $x = 0 \in \mathcal{K}$ if and only if the following conditions are satisfied: $\mathcal{B} \subset \mathcal{K}$ , and for some $m \geqslant n$ there are Euclidean unit vectors $\left\{{u_i}\right\}_{i=1}^m$ on the boundary of $\mathcal{K}$ and positive constants $\left\{{c_i}\right\}_{i=1}^m$ satisfying
where $I_n$ denotes the identity matrix in $\mathbb{R}^{n\times n}$ . Note that we have done away with the first condition $\sum_i c_i u_i = 0$ , because not all contact points with $\mathcal{K}_x^s$ are contact points with $\mathcal{K}$ . We are forced to place a zero weight on such vectors, and this is possible, because in such a case, $-u_i$ is a contact point with both $\mathcal{K}$ and $\mathcal{K}_x^s$ .
3. Analysis of mixing time
In what follows we let a discrete-time, homogeneous Markov chain be the triple $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ along with a distribution $P_0$ for the starting point, where the sample space is the convex body $\mathcal{K} \subset \mathbb{R}^n$ , the measurable sets on $\mathcal{K}$ are denoted by $\mathcal{A}$ , and $P_x({\cdot})$ denotes the transition measure for any $x \in \mathcal{K}$ . We say that $\pi$ is a stationary distribution of $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ if for all $A \in \mathcal{A}$ ,
We say that $\left\{{\mathcal{K}, \mathcal{A}, P_x({\cdot})}\right\}$ is reversible with respect to the stationary distribution $\pi$ if for all $A, A^{\prime} \in \mathcal{A}$ ,
3.1. Conductance and mixing times
We use the approach from [Reference Lovász and Simonovits10] of lower-bounding the conductance of the chain to prove mixing times. The conductance is defined as follows.
Definition 1. (Conductance.) Let P be a discrete-time homogenous Markov chain with kernel $P_x({\cdot})$ that is reversible with respect to the stationary measure $\pi({\cdot})$ . Given $A \in \mathcal{A}$ with $0 < \pi(A) < 1$ , the conductance of A is defined as
where $\Phi(A) \equiv \int_A P_u(\mathcal{K} {\setminus} A) \, d\pi(u)$ . The conductance of the chain is defined as
Recall that the total variation distance between two measures $P_1, P_2$ on a measurable space $(\mathcal{K}, \mathcal{A})$ is
Note that $|P_1(A) - P_2(A)| = |P_1(\mathcal{K}{\setminus} A) - P_2(\mathcal{K}{\setminus} A)|$ , so if the supremum is attained on any $A \in \mathcal{A}$ , then it is attained on $\mathcal{K} {\setminus} A \in \mathcal{A}$ as well. If $P_1$ and $P_2$ are both absolutely continuous with respect to a dominating measure $\mu$ and thus have densities $p_1 \equiv \frac{dP_1}{d\mu}$ and $p_2 \equiv \frac{dP_2}{d\mu}$ , respectively, the total variation distance may also be written as
where $S_1 = \left\{{x {\;\big|\;} p_1(x) > 0}\right\}$ . Recall that (3.1) does not depend on the choice of dominating measure $\mu$ , but that, rather, the densities are correctly specified with respect to the dominating measure. Additionally, note that the equality is attained on $\left\{{x {\;\big|\;} p_1(x) \geqslant p_2(x)}\right\}$ almost everywhere with respect to $\mu$ (or alternatively on its complement). The following relationship between conductance and the total variation distance to the stationary measure was proven in [Reference Lovász and Simonovits10].
Theorem 2. (Lovász and Simonovits.) Let $\pi_0$ be the initial distribution for a lazy, reversible Markov chain with conductance $\phi$ and stationary measure $\pi$ , and let $\pi_t$ denote the distribution after t steps. Let $\pi_0$ be an M-warm start for $\pi$ ; i.e., we have $M = \sup_{A \in \mathcal{A}} \frac{\pi_0(A)}{\pi(A)}$ . Then
As a consequence, we have the following bound on the mixing time.
Corollary 1. Given $\epsilon > 0$ and $M \equiv \sup \frac{\pi_0(A)}{\pi(A)}$ , after $t(\epsilon) \equiv \big\lceil \frac{2}{\phi^2} \log\!\big(\sqrt{M}/\epsilon\big) \big\rceil$ steps of the chain, we have $d_{TV}(\pi_{t(\epsilon)}, \pi) \leqslant \epsilon$ . Thus the Markov chain mixes in ${\widetilde{O}}\big(\phi^{-2}\big)$ steps from a warm start.
To find mixing times, it then suffices to lower-bound the conductance $\phi$ .
3.2. Isoperimetry
The typical means by which one finds lower bounds on the conductance is via isoperimetric inequalites. We first restate the cross-ratio used in the isoperimetric inequality we will employ.
Definition 2. (Cross-ratio.) Let $x, y \in \mathcal{K}$ , and let p, q be the end points of a chord in $\mathcal{K}$ passing through x, y, such that x lies in between p and y. The cross-ratio is defined to be
where $|\cdot|$ denotes the Euclidean norm.
Additionally, for any $S_1, S_2 \subset \mathcal{K}$ , let
In [Reference Lovász9], Lovász proved an isoperimetric inequality involving the cross-ratio, from which the conductance $\phi$ may be lower-bounded for the special case of the uniform distribution on a convex body $\mathcal{K} \subset \mathbb{R}^n$ . This was extended to log-concave measures by Lovász and Vempala in [Reference Lovász and Vempala12], of which the uniform measure on a convex body is a special case. We state the latter result as follows.
Theorem 3. (Lovász and Vempala.) For any log-concave probability measure $\pi({\cdot})$ supported on $\mathcal{K}$ and a partition of $\mathcal{K}$ into measurable subsets $S_1$ , $S_2$ , and $S_3 = \mathcal{K}{\setminus}(S_1 \cup S_2)$ , we have
3.3. Mixing of John’s walk
The key step in proving conductance lower bounds is to show that if two points x and y are close in geometric distance, then the total variation distance between the corresponding marginals $P_x$ and $P_y$ is bounded away from 1 by a controlled quantity. Note that given John’s ellipsoid $\mathcal{E}_x = \left\{{E_x u + x {\;\big|\;} |u| \leqslant 1}\right\}$ , where $E_x$ is symmetric and positive definite, a norm is induced via
We first relate this norm to the cross-ratio as follows.
Theorem 4. Let $\|{\cdot}\|_x$ denote the norm induced by the John’s ellipsoid of $\mathcal{K}_x^s$ . Then
Proof. Noting that the cross-ratio is invariant to affine transformations, without loss of generality we may assume by a suitable affine transformation that the John’s ellipsoid of $\mathcal{K}_x^s$ is the unit ball, and thus $\|y - x\|_x = |y- x|$ . Let p, x, y, q denote successive points on a chord through $\mathcal{K}_x^s$ . Then
where the last inequality follows from the containment
(see (2.4)) and the observation that either $|x - p|$ or $|y - q|$ is contained inside $\mathcal{K}_x^s.$
Before bounding the statistical distance between $P_x$ and $P_y$ given a bound on the geometric distance between x and y, we first state some useful lemmas regarding the ellipsoids $\mathcal{E}_x$ and $\mathcal{E}_y$ . The next lemma is a generalization of the Cauchy–Schwarz inequality to semidefinite matrices.
Lemma 1. (Semidefinite Cauchy–Schwarz.) Let $\alpha_1, \ldots, \alpha_m \in \mathbb{R}$ and let $A_1, \ldots, A_m \in \mathbb{R}^{r \times n}$ . Then
where $A \preceq B$ signifies that $B - A$ is positive semidefinite.
Proof. The proof is as in Lemma 3.11 in [Reference Kannan and Narayanan6]. For all i and j,
Thus
Now we study how the volume and aspect ratio of the John’s ellipsoid changes in a move from x to y. If the John’s ellipsoid centered at $x = 0$ is the unit ball (which we can assume after an affine transformation) and we make a move to y, the matrix $E_y$ such that
is the unique (from John’s theorem) $\arg\max$ over positive definite matrices of $ \log \det E$ under the constraint that
Noting that the John’s ellipsoid of $\mathcal{K}_x^s$ is a unit ball at the origin, we see that the matrix $E_x$ can be taken to be the identity matrix. Let us translate $\mathcal{K}$ by $x-y$ (where $x = 0$ ). All the contact points with $\mathcal{K} $ of $\mathcal{E}_x $ must lie on the boundary of $\mathcal{E}_y $ or outside $\mathcal{E}_y$ . Note that if $u_i$ is a contact point of $\mathcal{E}_x$ with $\mathcal{K}$ , then by definition, $u_i^T u_i = 1.$ This implies that for any $\|u\| = 1$ ,
The last containment above uses the fact that the John’s ellipsoid $\mathcal{E}_x$ to $\mathcal{K}^s_x$ at $x=0$ is the unit ball and that $u_i$ is a contact point of $\mathcal{E}_x$ with $\mathcal{K}$ . This in turn implies that
Since u may be an arbitrary unit vector, and $E_y$ is symmetric, this condition can be rewritten as
Note that we do not claim that $E_y$ is the matrix with largest determinant that satisfies the above constraints. There may be other constraints corresponding to contact points for $\mathcal{E}_y$ that are not contact points of $\mathcal{E}_x$ .
Using Theorem 1 and Lemma 1, we deduce an upper bound on $\det E_y$ as follows.
Recall that we will denote a universal positive constant that is large by C and a universal positive constant that is small by c.
Lemma 2. Let $r = cn^{-5/2}$ , and assume y is chosen from a ball of radius r such that $\|y - x\|_x = |y - x| \leqslant r$ . Then
Proof. Note that by (3.6), $E_y$ satisfies the constraints $u_i^TE_y u_i \leqslant 1 - u_i^T y.$ Since the weights corresponding to the John’s ellipsoid $\mathcal{K}_x^s$ are positive, the constraint implies that
In the above equation, $c_i$ is chosen so that it is non-zero for a contact point of the John’s ellipsoid with $\mathcal{K}_x^s$ only if it is also a contact point for $\mathcal{K}$ itself. Such a choice of $c_i$ can always be made, because, as explained in Subsection 2.2.1, if $u_i$ is a contact point with $\mathcal{K}_x^s$ that is not a contact point with $\mathcal{K}$ , then $-u_i$ must be a contact point with both $\mathcal{K}_x^s$ and $\mathcal{K}$ , and so no weight need fall on $u_i$ ; we can compensate with a weight on $-u_i$ . By (2.2) and (2.3), and using the linearity and cyclic invariance of the trace, we have
Considering $\big|\sum_i c_i u_i\big|$ to bound the middle term, we may employ Lemma 1. Letting $\alpha_i = \sqrt{c_i}$ and $A_i = \sqrt{c_i} u_i$ , we have
Noting that the right side is equal to $n I_n$ , and comparing the largest eigenvalue of the matrices on the left and the right side, it follows that
Therefore, if y is chosen from a ball of radius $cn^{-5/2}$ , by Cauchy–Schwarz we conclude that
Now, letting the eigenvalues of $E_y$ be denoted by $d_i > 0 $ , we have by the arithmetic–geometric mean inequality
Thus,
The last inequality assumes that c is a sufficiently small positive constant.
We deduce a lower bound on $\det E_y$ by considering a positive definite matrix of the form $E = \beta\big(I_n - \alpha y y^T\big)$ such that the corresponding ellipsoid $y + \mathcal{E}$ is contained in the unit ball $\mathcal{E}_x$ . Note that such a matrix has eigenvalue $\beta\big(1 - \alpha |y|^2\big)$ of multiplicity 1 corresponding to the unit eigenvector $y/|y|$ , and eigenvalues $\beta$ of multiplicity $n - 1$ corresponding to any unit vector z which is orthogonal to y. We choose $\beta = 1 -\frac{|y|}{\sqrt{n}}$ , and $\alpha = 2\sqrt{n} |y|^{-1}.$
Lemma 3. We have $y + \mathcal{E} \subseteq \mathcal{E}_x $ , and hence $\text{vol}(\mathcal{E}_y) \geqslant \text{vol}(\mathcal{E}).$
Proof. Assume after rescaling that $\mathcal{E}_x$ is the unit ball $\mathcal{B}$ centered at the origin. We divide the points u on the boundary of $\mathcal{B}$ into two sets: $A = \left\{{u{\;\big|\;} \left \langle {u}, \, {y} \right \rangle \leqslant \frac{|y|}{\sqrt{n}}}\right\}$ and $B = \left\{{u {\;\big|\;} \left \langle {u}, \, {y} \right \rangle > \frac{|y|}{\sqrt{n}}}\right\}$ .
If $u\in A$ , we have
and noting that $E \succ 0$ and $0 \prec \big(I_n- \alpha y y^T\big) \prec I_n$ ,
If $u \in B$ , we have
Therefore, for any $u \in A \cup B$ , we see that $|Eu| + \left \langle {u}, \, {y} \right \rangle \leqslant 1$ . From this, it follows that for any $u, v \in \partial\mathcal{B},$
Therefore, for all $v \in \partial\mathcal{B}$ , $|Ev + y| \leqslant 1$ .
Thus, $\mathcal{E} + y \subseteq \mathcal{E}_x \subseteq \mathcal{K}$ , and so $\mathcal{E} +y \subseteq \mathcal{K}_y^s.$ Hence the volume of the John’s ellipsoid of $\mathcal{K}_y^s$ , namely $\mathcal{E}_y$ , is at least the volume of $\mathcal{E}$ .
Lemma 4. Let $r = cn^{-5/2}$ , and assume y is chosen from a ball of radius r such that $\|y - x\|_x = |y - x| \leqslant r$ . Then
Proof. Considering the matrix E as provided by Lemma 3, $E_y$ satisfies
Thus
Lemmas 2 and 4 establish that for some universal constant $c > 0$ and $|y - x| \leqslant cn^{-5/2}$ , the volume ratio of $\mathcal{E}_y$ and $\mathcal{B}_x$ satisfies
This does not necessarily indicate that the shapes of $\mathcal{E}_x$ and $\mathcal{E}_y$ are close, a property we require so that rejection does not occur too frequently. The following lemma guarantees that this is indeed the case.
Lemma 5. Let $d_1 \geqslant d_2 \geqslant \ldots \geqslant d_n > 0$ denote the eigenvalues of $E_y$ . Then the minimum eigenvalue $d_n$ satisfies
for large enough n. Similarly, the maximum eigenvalue $d_1$ satisfies
for large enough n.
Proof. Assume the eigenvalues are ordered so that $d_1 \geqslant \ldots \geqslant d_n$ . If $d_n \geqslant 1$ , then of course $d_n \geqslant 1 - 4 n^{-1}$ , so we may assume that $\zeta \,:\!=\, 1 - d_n > 0$ . By Lemma 4 we see that
By the fact that $\text{tr}(E_y) \leqslant n + c/n^2$ (see (3.7)), making r smaller if necessary, we have that $\text{tr}(E_y) \leqslant n + n^{-2}.$ By the arithmetic–geometric mean inequality, we thus have
The last inequality holds because $(1 + z/m)^m \leqslant \exp\!(z)$ for $z \geqslant 0$ and integer $m \geqslant 1$ . Thus, to summarize the above chain of inequalities,
Then, (3.10) may be rephrased as
This implies that
We note that $\exp\!(\zeta)(1 - \zeta)$ is a monotonically decreasing function of $\zeta$ on (0, 1), because its derivative is
Recall that we say that a function $\tau$ of n satisfies $\tau = \Omega(1)$ if $\liminf\limits_{n \rightarrow \infty} > 0.$ If $\zeta = \Omega(1)$ , then $\exp\!(\zeta)(1- \zeta) \leqslant 1 - \Omega(1),$ which would contradict (3.11). Therefore, we assume that $\zeta = o(1)$ . But then
gives us $\zeta \leqslant \frac{4}{n}$ , which implies the lower bound on $d_n$ .
To obtain the desired upper bound on $d_1,$ we first note that since there is a uniform lower bound of $1- \frac{4}{n}$ on all the $d_i$ , and as the product lies within $\Big(1 - \frac{3}{n^2}, 1 + \frac{2}{n^2}\Big)$ , we obtain a uniform upper bound of $2 e^4$ on the $d_i$ and hence an upper bound of $2 e^4$ on $d_1$ . Therefore, if $\|y - x\|_x \leqslant r$ then $\|x - y\|_y \leqslant 2 e^4 r$ . We may now reverse the roles of x and y, and obtain the following upper bound on $d_1$ :
Now, to derive a lower bound on the conductance for John’s walk, we first must bound the statistical distance between two points given a bound on their geometric distance with respect to the local norm. Again without loss of generality, in what follows we may assume that $x = 0$ and that the John’s ellipsoid centered at x is the unit ball $\mathcal{B}_x$ (otherwise, perform an affine transformation so that this is the case). Let $x, y \in \mathcal{K}$ represent any two points in the body such that $\|y - x\|_x = |y - x| \leqslant r$ , where $r \in (0, 1)$ is a constant to be specified in terms of the dimension n. Let $P_x$ and $P_y$ denote the one-step transition probability measures defined at x and y, respectively. Let the uniform probability measures defined by the rescaled John’s ellipsoids $\mathcal{E}_x(r)$ and $\mathcal{E}_y(r)$ be denoted by $Q_x$ and $Q_y$ , respectively. We seek to bound
by choosing r such that the right side of (3.12) is $1-\Omega(1)$ .
To bound $d_{TV}(Q_x, Q_y)$ in (3.12), letting $\widetilde{Q}_y$ denote the probability measure corresponding to the uniform distribution on a ball of radius r centered at y, we may alternatively bound
We bound each term in (3.13) separately. To bound $d_{TV}\big(Q_x, \widetilde{Q}_y\big)$ , note that by our assumption that $\mathcal{E}_x = \mathcal{B}_x$ (the unit ball at x), the corresponding densities with respect to the dominating Lebesgue measure $\lambda$ are
and
Thus, using (3.1) and noting that $\lambda(\mathcal{B}_x(r)) = \lambda\big(\mathcal{B}_y(r)\big)$ , we have
The Lebesgue measure of $\mathcal{B}_x \cap \mathcal{B}_y$ is equal to twice the volume of a spherical cap. The following lemma regarding the volume of a spherical cap (Lemma 0.1 from [Reference Lovász and Simonovits10]) is useful.
Lemma 6. Let $\mathcal{B}_x \subset \mathbb{R}^n$ be the Euclidean ball of unit radius centered at x. Let $\mathcal{H} \subset \mathbb{R}^n$ define a half-space at a distance of at least t from x (so x is not contained in the half-space). Then for $t \leqslant \frac{1}{\sqrt{n}}$ , we have
The following lemma results trivially from Lemma 6 and (3.14).
Lemma 7. Let $t \leqslant 1$ . If $\|y - x\|_x = |y-x| \leqslant \frac{rt}{\sqrt{n}}$ , then
To bound $d_{TV}\big(\widetilde{Q}_y, Q_y\big)$ , note that we are bounding the total variation distance between a density supported on a ball and a density supported on an ellipsoid with the same center. The following lemma provides the bound.
Lemma 8. If $\|y-x\|_x \leqslant r = cn^{-5/2}$ , the total variation distance between $\widetilde{Q}_y$ and $Q_y$ satisfies
Proof. Note that by (3.1), we have
where B denotes the event in which $Z \in \big(1 - \frac{C}{n}\big)\cdot\mathcal{B}_y(r)$ . By Lemma 5, it follows that $\mathbb{P}_{\widetilde{Q}_y}(Z \in \mathcal{E}_y(r)|B) = 1$ , since the smallest eigenvalue of $E_y$ is at least $1 - Cn^{-1}$ . Additionally, by (3.9),
Now, noting that $\big(1-\frac{x}{2}\big) \geqslant e^{-x}$ for $x \in [0, 1]$ , we have $\mathbb{P}_{\widetilde{Q}_y}(Z \in B) = \left(1-\frac{C}{n}\right)^n \geqslant e^{-2c}$ , and
To bound $d_{TV}(P_x, Q_x)$ , we provide the following lemma.
Lemma 9. If $\|y-x\|_x \leqslant r = cn^{-5/2}$ , the total variation distance between $P_x$ and $Q_x$ satisfies
Proof. With some abuse of notation with regard to (2.6), temporarily let the density of $Q_x$ with respect to the dominating measure $\mu$ as defined by (2.7) be
Then, since $q_x(x) = 0$ and $p_x(y) \leqslant q_x(y)$ for $y \neq x$ , by (3.1) we have
where we let A denote the ‘accept’ event in which $x \in \mathcal{E}_Y(r)$ . Since $\mathcal{E}_x = \mathcal{B}_x$ , as in the proof of Lemma 8, we have for all $Y \in A$
Therefore,
where B is the event in which $Y \in \big(1-\frac{C}{n}\big)\cdot\mathcal{E}_x(r) = r\big(1-\frac{C}{n}\big)\cdot\mathcal{B}_x$ . Again by Lemma 5, $\mathbb{P}_{Q_x}(Y \in A| Y \in B) = 1$ . The remainder of the proof is as in Lemma 8.
Note that by a similar argument, $d_{TV}(Q_y, P_y) \leqslant 1/4$ for some universal $c > 0$ as well. Combining this with Lemmas 7, 8, and 9, we obtain the following theorem.
Theorem 5. If $\|y-x\|_x \leqslant \frac{rt}{\sqrt{n}} = ctn^{-3}$ for some universal constant $c > 0$ and some $t \leqslant 1$ , the total variation distance between $P_x$ and $P_y$ satisfies
In particular, we may choose $t = 1/8$ , so that $\epsilon = 1/8$ .
We finally arrive at a lower bound on the conductance for John’s walk using Theorems 3, 4, and 5. The proof of the next result is similar to that of Corollary 10 and Theorem 11 in [Reference Lovász9].
Theorem 6. (Conductance lower bound.) Consider the partition $\mathcal{K} = S_1 \cup S_2$ where $S_1, S_2 \in \mathcal{A}$ , and let $\pi$ be the uniform measure on $\mathcal{K}$ , i.e.,
Then for large enough n and $t = 1/8$ , we have
so $\phi = \Omega\big(n^{-7/2}\big)$ .
Proof. Note that the Radon–Nikodym derivative of $P_x$ with respect to the Lebesgue measure $\lambda$ is well-defined for all $y \in \mathcal{K}{\setminus}\left\{{x}\right\}$ , and is given as
Let
be the density for $\pi$ . Then for any $x, y \in \mathcal{K}$ such that $y \neq x$ , we have
from which it follows that $\pi$ is the stationary measure for the chain.
Now consider points far inside $S_1$ that are unlikely to cross over to $S_2$ . Letting $t = 1/8$ so $\epsilon = 1/8$ as in Theorem 5, we define
Similarly, let
Since $\rho(x)P_x(S_2) \geqslant \epsilon/(2\lambda(\mathcal{K}))$ for $x \in S_1{\setminus} S^{\prime}_1$ , we have
Similarly, for $y \in S_2{\setminus} S^{\prime}_2$ we have
By the reversibility of the chain, we have
so it follows that
Now let $\delta = ctn^{-3}$ . Assuming that $\pi\big(S^{\prime}_1\big) \leqslant (1-\delta)\pi(S_1)$ , we have $\pi\big(S_1{\setminus}S^{\prime}_1\big) = \pi(S_1) - \pi\big(S^{\prime}_1\big) \geqslant \delta\pi(S_1)$ , and thus
which proves the claim. Similarly, if $\pi\big(S^{\prime}_2\big) \leqslant (1-\delta)\pi(S_2)$ , the claim is proved again using (3.15). Thus, assume that $\pi\big(S^{\prime}_1\big) > (1-\delta)\pi(S_1)$ and $\pi\big(S^{\prime}_2\big) > (1-\delta)\pi(S_2)$ . By Theorem 3, we have
Now, given $x \in S^{\prime}_1$ and $y \in S^{\prime}_2$ , the total variation between $P_x$ and $P_y$ satisfies
By Theorem 5, it follows that $\|y-x\|_x > \delta$ . Then by Theorem 4 it follows that
Finally, we deduce that
The claim follows by absorbing terms into the constant for large enough n.
Now applying Corollary 1, we obtain our main theorem.
Theorem 7. For $\epsilon > 0$ and $M \geqslant \sup \frac{\pi_0(A)}{\pi(A)}$ , after $t(\epsilon) = C n^7 \log\!\big(\sqrt{M}/\epsilon\big) $ steps of John’s walk, we have $d_{TV}(\pi_{t(\epsilon)}, \pi) \leqslant \epsilon$ .
As a matter of fact, this theorem allows us to find mixing time bounds starting from any point that is not on the boundary of $\mathcal{K}$ . Suppose we know that x belongs to the interior of $\mathcal{K}$ and satisfies the following chord condition. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then we see that
After a random geometrically distributed time $\tau$ with mean bounded above by an absolute constant, the first nontrivial move occurs. Then the distribution of $x_\tau$ has a density bounded above by
We thus have the following theorem.
Theorem 8. Let $r = c n^{-\frac{5}{2}}$ be the radius of the John’s ellipsoids in the walk. For all chords pq of $\mathcal{K}$ containing x, assume $\frac{|p-x|}{|q-x|} \in \big(\eta, \eta^{-1}\big)$ for some parameter $0 < \eta < 1$ that measures the centrality of x in $\mathcal{K}$ . Then there is a random geometrically distributed time $\tau$ with mean bounded above by C such that for $\epsilon > 0$ , after $t(\epsilon) + \tau = C n^7 \left(n \log\!\big(\sqrt{n}/(r\eta)\big) + \log\!({1}/\epsilon)\right) + \tau$ steps of John’s walk, we have $d_{TV}(\pi_{t(\epsilon) + \tau}, \pi) \leqslant \epsilon$ .
4. Concluding remarks
We introduced an affine-invariant random walk akin to the Dikin walk which uses uniform sampling from John’s ellipsoids of a certain small radius of appropriately symmetrized convex sets to make steps, and showed that this walk mixes to within a total variation distance $\epsilon$ in $O\big(n^7\log \epsilon^{-1}\big)$ steps from a warm start. The type of convex body $\mathcal{K}$ is not specified (i.e., it need not be a polytope) in our analysis of the mixing time, but one must have access to the John’s ellipsoid of the current symmetrization of the convex body. A significant feature of this walk is that its mixing time from a warm start, or alternatively a ‘central point’ such as the center of mass, can be bounded above by a quantity that has absolutely no dependence on any parameter associated with the body apart from its dimension.
One potential avenue for obtaining an improved mixing time is to increase r and control the relative change of shape of the John’s ellipsoids $\mathcal{E}_x$ and $\mathcal{E}_y$ when $\|x - y\|_x < r/\sqrt{n}$ in a more direct manner. We suspect that the method of controlling the change in shape using the arithmetic–geometric mean inequality and the ratio of the volumes of $\mathcal{E}_x$ and $\mathcal{E}_y$ that appears in Lemma 3.3 might be suboptimal.
Acknowledgements
We are deeply grateful to the anonymous reviewers for their role in improving this paper by pointing out numerous places where the arguments needed greater clarity.
Funding information
H. N. was partially supported by a Ramanujan Fellowship and a Swarna Jayanti Fellowship instituted by the Government of India. H. N. also acknowledges the support of DAE project no. 12-R&D-TFR-5.01-0500.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.