1. Introduction
In this article, we study concentration of measures related to the graphon particle system and its finite particle approximations. This work is a continuation of the earlier papers [Reference Bayraktar, Chakraborty and Wu2–Reference Bayraktar and Wu4]. A graphon particle system consists of uncountably many heterogeneous particles $X_u$ for $u \in [0, 1]$ whose interactions are characterized by a graphon. More precisely, for a fixed $T > 0$ and $d \in \mathbb{N}$ , we consider the following system:
where $\{B_u\}_{u \in [0, 1]}$ is a family of independent and identically distributed d-dimensional Brownian motions, and $\{X_u(0)\}_{u \in [0, 1]}$ is a collection of independent (but not necessarily identically distributed) $\mathbb{R}^d$ -valued random variables with law $\mu_u(0)$ , independent of $\{B_u\}_{u \in [0, 1]}$ for each $u \in [0, 1]$ , defined on a filtered probability space $(\Omega, \mathscr{F}, \{\mathscr{F}_t\}, \mathbb{P})$ . Two functions $\phi \;:\; \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d$ and $\psi \;:\; \mathbb{R}^d \rightarrow \mathbb{R}^d$ represent pairwise interactions between the particles and single-particle drift, respectively. The quantity $\sigma \in \mathbb{R}^{d \times d}$ is a constant, and $G \;:\; [0, 1] \times [0, 1] \rightarrow [0, 1]$ is a graphon, that is, a symmetric measurable function.
Along with the graphon particle system, we introduce two finite particle systems with heterogeneous interactions, which approximate (1.1). For a fixed, arbitrary $n \in \mathbb{N}$ and each $i \in [n] \;:\!=\; \{ 1, \cdots, n\}$ , we first consider the ‘not-so-dense’ analogue of (1.1) introduced in Section 4 of [Reference Bayraktar, Chakraborty and Wu2]:
where $\{p(n)\}_{n \in \mathbb{N}} \subset (0, 1]$ is a sequence of numbers and $\{\xi^n_{ij}\}_{1 \le i, j \le n}$ are independent Bernoulli random variables satisfying
independent of $\{B_{i/n}, X_{i/n}(0) \;:\; i \in [n]\}$ . Here, p(n) represents the global sparsity parameter, and the strength of interaction between the particles in (1.2) is scaled by np(n), the order of the number of neighbors, as in mean-field systems on Erdös–Rényi random graphs [Reference Bhamidi, Budhiraja and Wu7, Reference Delarue15, Reference Oliveira and Reis27]; the convergence of $p(n) \rightarrow 0$ as $n \rightarrow \infty$ implies that the graph is sparse, but we shall consider the case $np(n) \rightarrow \infty$ , i.e., the random graph is ‘not so dense’, meaning that the average degree of the graph diverges.
The other finite particle approximation system is given by
Since this system has a nonrandom coefficient for the interaction term (but still models heterogeneous interaction via the graphon), it is easier to analyze than the other finite particle system (1.2). We note that the three systems (1.1)–(1.3) are coupled in the sense that they share initial particle locations $X_{i/n}(0)$ and Brownian motions $B_{i/n}$ for $i \in [n]$ .
Law-of-large-numbers-type results on the convergence of the systems (1.2) and (1.3) to the graphon particle system (1.1) under suitable conditions are studied in [Reference Bayraktar, Chakraborty and Wu2]. Results on the exponential ergodicity of the two systems (1.1) and (1.2), as well as the uniform-in-time convergence of (1.2) to (1.1) under a certain dissipativity condition, are presented in [Reference Bayraktar and Wu3]. There are numerous recent studies of graphon particle systems [Reference Bayraktar and Wu4, Reference Coppini14] and works on associated heterogeneously interacting finite particle models [Reference Bet, Coppini and Nardi6, Reference Coppini13, Reference Delattre, Giacomin and Luçon17, Reference Lacker and Soret25–Reference Oliveira and Reis27]. These studies have been undertaken because graphons have been widely applied in mean-field game theory for both the static and dynamic cases; see e.g. [Reference Aurell, Carmona and Laurière1, Reference Bayraktar, Wu and Zhang5, Reference Caines and Huang10–Reference Carmona, Cooney, Graves and Laurière12, Reference Gao, Caines and Huang20, Reference Gao, Tchuendom and Caines21, Reference Parise and Ozdaglar28, Reference Tchuendom, Caines and Huang30, Reference Tchuendom, Caines and Huang31, Reference Vasal, Mishra and Vishwanath33] and references therein.
Among these studies, our work is particularly linked to [Reference Bayraktar and Wu4]. With $W_1$ denoting the 1-Wasserstein distance, with the empirical measures of the three particle systems at time $t \in [0, T]$ defined by
and with the averaged law $\widetilde{\mu}_t \;:\!=\; \int_0^1 \mu_{u, t} \, du$ of the graphon system (1.1), [Reference Bayraktar and Wu4] computes concentration bounds of the types $\mathbb{P} \big[ \sup_{0 \le t \le T} W_1(\bar{L}_{n, t}, \widetilde{\mu}_t) > \epsilon \big]$ , $\sup_{t \ge 0} \mathbb{P} \big[ W_1(\bar{L}_{n, t}, \widetilde{\mu}_t) > \epsilon \big]$ for $\epsilon > 0$ under certain conditions. In particular, uniform-in-time concentration bounds of the latter type are studied in an infinite-time-horizon setting under an extra dissipativity condition on $\psi$ . These results are established by computing certain sub-Gaussian estimates rather directly with the moment generating function of the standard normal random vector (Lemmas 3.7–3.10 of [Reference Bayraktar and Wu4]).
In contrast, the present work focuses on the case of finite time horizon and deals with a more general sparsity sequence $\{p(n)\}_{n \in \mathbb{N}} \subset (0, 1]$ for (1.2), whereas the results of [Reference Bayraktar and Wu4] cover only the dense graphs, i.e., $p(n) \equiv 1$ . For our argument we adopt the method of [Reference Delarue, Lacker and Ramanan16], as follows. We first compute the bound on the probability that Lipschitz functions of the finite particles $\bar{X}^n = (\bar{X}^n_1, \cdots, \bar{X}^n_n)$ from the system (1.3) on the space $\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n$ deviate from their means (Theorem 3.1). The derivation of this ‘concentration bound around their means’ relies on transportation inequalities from [Reference Djellout, Guillin and Wu18]. Combining this bound with the fact, presented in Section 3.1, that the expectations of the $W_2$ -distances between the empirical measures in (1.4) converge to zero as the number of particles goes to infinity, we show exponential concentration bounds on the probabilities
for $p = 1$ (Theorem 3.3).
The advantage of using transportation inequalities to find ‘concentration around means’, compared to the approach of [Reference Bayraktar and Wu4], is that we can even derive the same exponential bound (1.5) in the case of $p = 2$ (Theorem 3.5), at the cost of assuming independence and (3.3) on the initial particles. More specifically, we can obtain a dimension-free concentration bound around means,
for every Lipschitz function F and $a > 0$ (see (3.4))—i.e., the right-hand side does not depend on n—from the quadratic transportation inequality (3.3) on the initial particles. This bound can be derived thanks to the remarkable result on the dimension-free $W_2$ -tensorization of transportation inequalities from [Reference Gozlan and Léonard22] (Lemma 2.7(ii)). Then, again by combining the dimension-free concentration bound around means with the results in Section 3.1, we ultimately arrive at the exponential bound (1.5) for $p = 2$ . The authors of [Reference Delarue, Lacker and Ramanan16] apply this method to compute similar concentration bounds when $\bar{X}^n$ represents the state of the so-called Nash equilibrium of a symmetric n-player stochastic differential game and $\widetilde{\mu}$ is the measure flow of the unique equilibrium of the corresponding mean-field game (we refer to Section 2 of [Reference Delarue, Lacker and Ramanan16] for a more detailed description of these terms).
Moreover, inspired by the argument using Bernstein’s inequality in [Reference Oliveira and Reis27], we compare the particles $X^n_i$ and $\bar{X}^n_i$ to improve the exponential bound in [Reference Bayraktar and Wu4] for
when $p = 1$ , at the cost of an assumption on the interaction function $\phi$ , namely, that it is a member of the $L^1$ -Fourier class. When $p=2$ , we also present a similar exponential bound for the system (1.2) on the dense graphs ( $p(n) \equiv 1$ ). This requires a refinement of the Bernstein-type inequality for the cut norm in [Reference Oliveira and Reis27]; see Lemma 3.1. Without such a condition on $\phi$ , we have a similar exponential bound, but in terms of the bounded Lipschitz metric ( $d_{BL}$ metric), a weaker metric than $W_1$ .
This paper is organized as follows. In Section 2 we introduce the notation, state the assumptions, and recall some of the relevant existing results concerning the particle systems (1.1)–(1.3), as well as other preliminary results. Section 3 provides our main results, and Section 4 gives the proofs of these results.
2. Preliminaries
In this section, we first introduce the notation which will be used throughout this paper. We then state several assumptions and some of the basic results on the particle systems (1.1)–(1.3) from [Reference Bayraktar, Chakraborty and Wu2, Reference Bayraktar and Wu4], and provide several well-known results regarding transportation cost inequalities without proof. Finally, we introduce Bernstein’s inequality and the concept of the $L^1$ -Fourier class.
2.1. Notation
Given a metric space (S, d) and a function $f \;:\; S \rightarrow \mathbb{R}$ , we define
and we say that f is Lipschitz (respectively, bounded Lipschitz) if $\vert\vert f \vert\vert_{Lip} < \infty$ (respectively, $\vert\vert f \vert\vert_{BL} < \infty$ ). In particular, f is called a-Lipschitz if $\Vert f \Vert_{Lip} = a$ .
Denote by $\mathcal{P}(S)$ the space of Borel probability measures on S. We shall use the standard notation $\langle \mu, \varphi \rangle \;:\!=\; \int_S \varphi \, d\mu$ for integrable functions $\varphi$ and measures $\mu$ on S. When $(S, \vert\vert \cdot \vert\vert)$ is a normed space, we write $\mathcal{P}^p(S, \vert\vert\cdot\vert\vert)$ for the set of $\mu \in \mathcal{P}(S)$ satisfying $\langle \mu, \vert\vert\cdot\vert\vert^p \rangle < \infty$ for a given $p \in [1, \infty)$ . We denote by $Lip(S, \vert\vert\cdot\vert\vert)$ the set of 1-Lipschitz functions, i.e., $f \;:\; S \rightarrow \mathbb{R}$ satisfying $\vert f(x) - f(y) \vert \le \vert\vert x-y \vert\vert$ for every $x, y \in S$ .
For a separable Banach space $(S, \vert\vert\cdot\vert\vert)$ , we endow $\mathcal{P}^p(S, \vert\vert\cdot\vert\vert)$ with the p-Wasserstein metric
where the infimum is taken over all probability measures $\pi$ on $S \times S$ with first and second marginals $\mu$ and $\nu$ . We also consider the product space $S^n \;:\!=\; S \times \cdots \times S$ , equipped with the $\ell^p$ norm (for any $p \ge 1$ )
for $x = (x_1, \cdots, x_n) \in S^n$ . When the space S or the norm $\vert\vert\cdot\vert\vert$ is understood, we sometimes omit it from the above notation.
Denote by $C([0, T] \;:\; S)$ the space of continuous functions from [0, T] to S, and $\vert\vert x \vert \vert_{\star, t} \;:\!=\; \sup_{0 \le s \le t}\vert x_s \vert$ , where $\vert \cdot \vert$ is the usual Euclidean norm on $\mathbb{R}^d$ for $x \in C([0, T] \;:\; \mathbb{R}^d)$ and $t \in [0, T]$ . We write $\mathcal{L}(X)$ for the probability law of a random variable X and $[n] \;:\!=\; \{1, \cdots, n\}$ for any $n \in \mathbb{N}$ . We use K to denote various positive constants throughout the paper; its value may change from line to line.
For a Polish space (S, d) with Borel $\sigma$ -field $\mathcal{S}$ , we also consider the space of probability measures over $(S, \mathcal{S})$ endowed with the topology of weak convergence, which is metrized by the BL metric, defined for $\mu, \nu \in \mathcal{P}(S)$ by
Note the dual representation of the 1-Wasserstein metric
along with the relationship $d_{BL} \le W_1$ . We shall also use the following notation: for given $\mu, \nu \in \mathcal{P}(C([0, T] \;:\; \mathbb{R}^d))$ ,
where the infimum is taken over all probability measures $\pi$ with marginals $\mu$ and $\nu$ .
Let us define three $n \times n$ random matrices $P^{(n)}$ , $\bar{P}^{(n)}$ , and $D^{(n)}$ , related to the systems (1.2), (1.3), for every $n \in \mathbb{N}$ , with entries
For these matrices, we define the $\ell_{\infty} \rightarrow \ell_1$ norm of an $n \times n$ matrix A by
This norm is known to be equivalent to the so-called cut norm (see (3.3) of [Reference Guëdon and Vershynin23]).
We denote the empirical measures of the approximation systems for each $n \in \mathbb{N}$ by
all of which are random elements of $\mathcal{P}(C([0, T] \;:\; \mathbb{R}^d))$ .
We conclude this subsection by recalling the relative entropy of two probability measures $\mu, \nu$ over the same measurable space:
2.2. Existence and uniqueness of the solutions
We state the existence and uniqueness of strong solutions to the systems (1.1)–(1.3).
Assumption 2.1
-
(i) The function $\phi$ is bounded; furthermore, $\phi$ and $\psi$ are Lipschitz, i.e., there exists a constant $K > 0$ such that
\begin{equation*} \big\vert \phi(x_1, y_1) - \phi(x_2, y_2) \big\vert + \big\vert \psi(x_1) - \psi(x_2) \big\vert \le K \big( \vert x_1-x_2 \vert + \vert y_1 - y_2 \vert \big) \end{equation*}holds. Moreover, the initial particles have finite second moments, i.e.,(2.9) \begin{equation} \sup_{u \in [0, 1]} \mathbb{E} \big\vert X_u(0) \big\vert^2 < \infty. \end{equation} -
(ii) The map $[0, 1] \ni u \mapsto \mu_u(0) = \mathcal{L}(X_u(0)) \in \mathcal{P}(\mathbb{R}^d)$ is measurable.
Lemma 2.1. (Existence and uniqueness of the particle systems)
The proof of Lemma 2.1(i) is classical (see e.g. Theorem 5.2.9 of [Reference Karatzas and Shreve24]). Part (ii) follows from Proposition 2.1 of [Reference Bayraktar, Chakraborty and Wu2]. As pointed out in Remark 2.2 of [Reference Bayraktar, Chakraborty and Wu2], we note that the boundedness condition on $\phi$ in Assumption 2.1(i) can be removed throughout this paper, at the cost of a stronger condition $\sup_{u \in [0, 1]} \mathbb{E} \vert X_u(0) \vert^{2 + \epsilon} < \infty$ for some $\epsilon > 0$ than (2.9). We occasionally need an even stronger condition on the initial particles, as in the following.
Assumption 2.2. The initial particles $\{X_{u}(0)\}_{u \in [0, 1]}$ are independent, with law $\mu_{u, 0} \in \mathcal{P}(\mathbb{R}^d)$ satisfying
Under this stronger assumption, we have, in particular, the finite fourth moment of the solution to (1.1). The proof is standard and hence is omitted (see e.g. [Reference Sznitman29] or Proposition 2.1 of [Reference Bayraktar and Wu4]).
2.3. Continuity of the graphon system
The following result, which states the continuity of the graphon system (1.1), is from Theorem 2.1 of [Reference Bayraktar, Chakraborty and Wu2].
Assumption 2.3. There exists a finite collection of subintervals $\{I_i \;:\; i \in [N]\}$ , for some $N \in \mathbb{N}$ , satisfying $\cup_{i=1}^N I_i = [0, 1]$ . For each $i, j \in [N]$ , the following hold:
-
(i) The map $I_i \ni u \mapsto \mu_u(0) \in \mathcal{P}(\mathbb{R}^d)$ is continuous with respect to the $W_2$ metric.
-
(ii) For each $u \in I_i$ , there exists a Lebesgue-null set $N_u \subset [0, 1]$ such that G(u, v) is continuous at $(u, v) \in [0, 1] \times [0, 1]$ for each $v \in [0, 1] \setminus N_u$ .
-
(iii) There exists $K > 0$ such that
\begin{align*} W_2(\mu_{u_1}(0), \mu_{u_2}(0)) &\le K \vert u_1 - u_2 \vert, \qquad \qquad \qquad \qquad u_1, u_2 \in [0, 1], \\[5pt] \big\vert G(u_1, v_1) - G(u_2, v_2) \big\vert &\le K \big( \vert u_1 - u_2 \vert + \vert v_1 - v_2 \vert \big), \qquad (u_1, v_1), \, (u_2, v_2) \in I_i \times I_j. \end{align*}
Lemma 2.3. Suppose that Assumption 2.1 holds.
-
(i) (Continuity.) Under Assumption 2.3(i)–(ii), the map $I_i \ni u \mapsto \mu_u \in \mathcal{P}\big(C([0, T] \;:\; \mathbb{R}^d)\big)$ is continuous with respect to the $W_{2, T}$ metric for every $i \in [N]$ .
-
(ii) (Lipschitz continuity.) Under Assumption 2.3(iii), there exists $\kappa > 0$ , which depends on T, such that $W_{2, T}(\mu_u, \mu_v) \le \kappa \vert u-v \vert$ whenever $u, v \in I_i$ for some $i \in [N]$ .
In Lemma 2.3(ii), note that we have, in particular,
2.4. A law of large numbers for the mean-field particle system
Besides the assumptions already introduced in this section, we will need the following assumption on the sparsity parameter for the system (1.2), as briefly mentioned in Section 1.
Assumption 2.4. The sequence $\{p(n)\}_{n \in \mathbb{N}}$ in (1.2) satisfies $np(n) \rightarrow \infty$ as $n \rightarrow \infty$ .
We introduce the following law-of-large-numbers result for the mean-field particle system (1.2), which is Theorem 4.1 of [Reference Bayraktar, Chakraborty and Wu2]. We write $\mu_u$ for the law of $X_u$ in the graphon particle system (1.1) for each $u \in [0, 1]$ , and define
Lemma 2.4. Under Assumptions 2.1, 2.3, and 2.4,
Moreover, we have
2.5. Transportation inequalities
In this subsection, we present some preliminary results regarding transportation inequalities. The first result, from Theorem 9.1 of [Reference Üstünel32], illustrates the transportation inequality with the uniform norm for the laws of diffusion processes.
Lemma 2.5. For a fixed $T>0$ and $k \in \mathbb{N}$ , suppose that $X^x = \{X^x_t\}_{t \in [0, T]}$ is the unique strong solution of the stochastic differential equation (SDE)
on a probability space $C([0, T] \;:\; \mathbb{R}^k)$ supporting a k-dimensional Brownian motion W. Here, $b \;:\; [0, T] \times C([0, T] \;:\; \mathbb{R}^k) \rightarrow \mathbb{R}^k$ satisfies, for any $\xi, \eta \in C([0, T] \;:\; \mathbb{R}^k)$ ,
for some constants $L > 0$ and $\Sigma \in \mathbb{R}^{k \times k}$ . Let $P^x \in \mathcal{P}(C([0, T] \;:\; \mathbb{R}^k))$ be the law of $X^x$ for any $x \in \mathbb{R}^k$ . Then for any $Q \in \mathcal{P}(C([0, T] \;:\; \mathbb{R}^k))$ , there exist positive constants $\kappa_1, \kappa_2$ , depending only on T, such that the inequalities
hold, where $H(Q\vert P)$ is the relative entropy of Q with respect to P, defined in (2.8).
The following result (Theorem 5.1 of [Reference Delarue, Lacker and Ramanan16]) characterizes the concentration of a probability measure with a transportation cost inequality and Gaussian integrability property. The equivalence between (2.14) and (2.15) is originally from Theorem 3.1 of [Reference Bobkov and Götze8], and the equivalence between (2.14) and (2.17) is due to Theorem 2.3 of [Reference Djellout, Guillin and Wu18].
Lemma 2.6. For a probability measure $\mu \in \mathcal{P}^1(S)$ on a separable Banach space $(S, \vert\vert\cdot\vert\vert)$ , the following statements are equivalent up to a universal change in the positive constant c:
-
(i) The transportation cost inequality
(2.14) \begin{equation} W_{1, S}(\mu, \nu) \le \sqrt{2c H(\nu \vert \mu)} \end{equation}holds for every $\nu \in \mathcal{P}(S)$ . -
(ii) For every 1-Lipschitz function f on S and $\lambda \in \mathbb{R}$ ,
(2.15) \begin{equation} \int_S e^{\lambda(f - \langle \mu, f \rangle )}d\mu \le \exp\!\Big(\frac{c\lambda^2}{2}\Big) \end{equation}holds. -
(iii) For every 1-Lipschitz function f on S and $a > 0$ ,
(2.16) \begin{equation} \mu \big(f - \langle \mu, f \rangle > a\big) \le \exp\!\Big(\!-\! \frac{a^2}{2c}\Big). \end{equation} -
(iv) The probability measure $\mu$ is sub-Gaussian, i.e.,
(2.17) \begin{equation} \int_S e^{c\vert\vert x \vert\vert^2} \mu(dx) < \infty. \end{equation}
The next result is a well-known tensorization of transportation cost inequalities from Corollary 5 of [Reference Gozlan and Léonard22]. The major difference between (i) and (ii) is that the inequality (2.18) is dimension-free, i.e., the right-hand side does not depend on n.
Lemma 2.7. For each $n \in \mathbb{N}$ , consider a set of probability measures $\{\mu_i\}_{i \in [n]} \subset \mathcal{P}(S)$ on a separable Banach space $(S, \Vert\cdot\Vert)$ .
-
(i) If the inequality $W_{1, S}(\mu_i, \nu) \le \sqrt{2c H(\nu \vert \mu_i)}$ holds for every $i \in [n]$ and $\nu \in \mathcal{P}^1(S)$ , then
\begin{equation*} W_{1, (S^n, \, \vert\vert\cdot\vert\vert_{n, 1})}(\mu_1 \otimes \cdots \otimes \mu_n, \, \rho) \le \sqrt{2nc H(\rho \vert \mu_1 \otimes \cdots \otimes \mu_n)} \end{equation*}holds for every $\rho \in \mathcal{P}^1(S^n)$ . -
(ii) If the inequality $W_{2, S}(\mu_i, \nu) \le \sqrt{2c H(\nu \vert \mu_i)}$ holds for every $i \in [n]$ and $\nu \in \mathcal{P}^2(S)$ , then
(2.18) \begin{equation} W_{2, (S^n, \, \vert\vert\cdot\vert\vert_{n, 2})}(\mu_1 \otimes \cdots \otimes \mu_n, \, \rho) \le \sqrt{2c H(\rho \vert \mu_1 \otimes \cdots \otimes \mu_n)} \end{equation}holds for every $\rho \in \mathcal{P}^2(S^n)$ .
We finally mention the following result on the Wasserstein distance of the empirical measures of independent but not necessarily identically distributed random variables. This is Lemma A.1 of [Reference Bayraktar and Wu4], a generalization of Theorem 1 of [Reference Fournier and Guillin19], where independent and identically distributed random variables are considered. This result will be used in proving Proposition 3.2.
Lemma 2.8. Let $\{Y_i\}_{i \in \mathbb{N}}$ be independent $\mathbb{R}^d$ -valued random variables, and define
For a fixed $p > 0$ , assume that $\sup_{i \in \mathbb{N}} \mathbb{E} \vert Y_i \vert^q < \infty$ holds for some $q > p$ . Then there exists a constant $K > 0$ depending only on p, q, and d such that for every $n \ge 1$ ,
where
2.6. Bernstein’s inequality and the $\boldsymbol{L}^1$ -Fourier class
When comparing the two approximation systems (1.2) and (1.3), it is essential to control the matrix $D^{(n)}$ of (2.5). Thus, we introduce the following concentration of $D^{(n)}$ in terms of the $\Vert \cdot \Vert_{\infty \rightarrow 1}$ norm, which is from Lemma 2 of [Reference Oliveira and Reis27]. Its proof is a straightforward application of Bernstein’s inequality (Lemma 2.10, or Bennett’s inequality) with the distribution of the $n^2$ independent entries of the matrix $D^{(n)}$ . We will use Bernstein’s inequality again in Section 3.3 to prove Lemma 3.1, an elaboration of Lemma 2.9.
Lemma 2.9. For any $0 < \eta \le n$ , we have
In particular, under Assumption 2.4, we have for every $\eta > 0$
Lemma 2.10. (Bernstein’s inequality, Theorem 2.9 of [Reference Boucheron, Lugosi and Massart9].) Let $X_1, \cdots X_k$ be independent random variables with finite variance such that $X_i \le b$ for some $b > 0$ almost surely for each $i \in [k]$ . Let $v = \sum_{i=1}^k \mathbb{E}[X_i^2]$ ; then we have
When the interaction function $\phi$ belongs to a special class of functions, we shall see in the proof of Theorem 3.4 that the distance $W_1(L_n, \bar{L}_n)$ can easily be expressed in terms of the quantity $\vert\vert D^{(n)} \vert\vert_{\infty \rightarrow 1}$ . This observation is inspired by the work of [Reference Oliveira and Reis27]. To state it more precisely, we introduce the notion of the $L^1$ -Fourier class of functions.
Definition 2.1. Identifying $\mathbb{R}^{2d}$ with $\mathbb{R}^d \times \mathbb{R}^d$ , we say that a function $f \;:\; \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}$ belongs to the $L^1$ -Fourier class if there exists a finite complex measure $m_{f}$ over $\mathbb{R}^{2d}$ such that for every $(x, y) \in \mathbb{R}^d \times \mathbb{R}^d$ ,
We recall that a finite complex measure m over $\mathbb{R}^{2d}$ is a set function $m \;:\; \mathcal{B}(\mathbb{R}^{2d}) \rightarrow \mathbb{C}$ of the form $m = m_r^+ - m_r^- + \sqrt{-1}(m_i^+ - m_i^-)$ , where each of $m_r^+, m_r^-, m_i^+, m_i^-$ is a finite, $\sigma$ -additive (nonnegative) measure over $\mathbb{R}^{2d}$ . We define the total mass of m by
If a function f is an inverse $L^1$ -transform of a function in $L^1(\mathbb{R}^{2d})$ , then f belongs to the $L^1$ -Fourier class. In particular, any Schwartz function belongs to the $L^1$ -Fourier class. An example of such a function is the Kuramoto interaction; if $d=1$ and $\phi(x-y) = K \sin(y-x)$ for some constant K, then the corresponding complex measure is equal to
The finite system (1.2) of ‘oscillators’ with the Kuramoto interaction function is studied in [Reference Coppini13].
3. Main results
This section consists of three subsections. The first shows that expectations of the $W_2$ -distances between two empirical measures on $\mathbb{R}^d$ related to the systems (1.1)–(1.3) converge to zero as the number of particles goes to infinity. The second subsection gives exponential bounds on the probabilities that Lipschitz function values of the particles $\bar{X}^n$ of the system (1.3) on $\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n$ deviate from their means; the stronger the norm we use for the space $\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n$ , the stronger the assumption we need on the initial distribution of the particles. The results in the first two subsections will be used in proving the results in the last subsection. In the latter, we derive several results on the concentration of the finite particle systems (1.2), (1.3) toward the graphon particle system (1.1), under different metrics.
3.1. Concentration in mean of the $\boldsymbol{W}_2$ -distance
Let us recall the law $\mu_{u, t}$ of (1.1), the empirical measures (1.4) of the three systems, and the averaged law $\widetilde{\mu}_t \;:\!=\; \int_0^1 \mu_{u, t} \, du$ for every $t \in [0, T]$ . We give two expectations converging to zero as $n \rightarrow \infty$ in the following. The proofs are provided in Section 4.1.
Proposition 3.1. Under Assumptions 2.1 and 2.4,
as $n\rightarrow \infty$ .
Proposition 3.2. Under Assumptions 2.1, 2.2, and 2.3(iii),
as $n\rightarrow \infty$ .
By virtue of Lemma 2.4, we have
Combining the last convergence with Propositions 3.1 and 3.2, we immediately have other convergences of the expectations.
Corollary 3.1. Under assumptions of Propositions 3.1, 3.2,
as $n\rightarrow \infty$ .
3.2. Concentration around the mean
We present in this subsection the concentration of a 1-Lipschitz function of the particles $\bar{X}^n$ around its mean, under two different norms $\ell^1$ and $\ell^2$ . The proofs of the results rely on the transportation inequalities presented in Section 2.5, and they will be given in Section 4.2.
From Lemma 2.6, we note that the condition (2.10) of Assumption 2.2 in Theorem 3.1 below is equivalent to the condition
Theorem 3.1. Under Assumptions 2.1 and 2.2, there exists a constant $\delta > 0$ , independent of n, such that for every $F \in Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \vert\vert\cdot\vert\vert_{n, 1}\Big)$ and every $a > 0$ ,
holds.
We have the following result analogous to Theorem 3.1 when the condition (3.1) is replaced by (3.3). For example, if the initial law for any $u \in [0, 1]$ takes the same form $\mu_{u, 0}(dx) = e^{-U(x)} dx$ for some $U \in C^2(\mathbb{R}^d)$ with Hessian bounded below in semidefinite order by cI for some $c>0$ , then $\mu_{u, 0}$ satisfies the condition (3.3) with $\kappa = 1/c$ . In particular, if $\mu_{u, 0}$ has the standard normal distribution on $\mathbb{R}^d$ , then (3.3) holds with $\kappa = 1$ (however, the initial law may not necessarily be the same for every $u \in [0, 1]$ in the statement of Theorem 3.2).
We note here that the positive constant $\delta$ which appears in Theorems 3.1–3.6 of this section depends only on the constants c of Lemmas 2.6 and 2.7, $\kappa$ of (3.1) and (3.3), the functions $\phi, \psi$ , and T, but not on n, the number of particles. We also emphasize that the concentration inequality (3.4) is dimension-free; the bound on the right-hand side does not depend on n. This property will play an essential role in deriving the exponential concentration of the empirical measures in terms of $W_2$ -distance.
Theorem 3.2. Suppose that the initial particles $\{X_u(0)\}_{u \in [0, 1]}$ are independent, with law $\mu_{u, 0} \in \mathcal{P}(\mathbb{R}^d)$ satisfying, for some $\kappa > 0$ ,
Under Assumption 2.1, there exists a constant $\delta > 0$ , independent of n, such that for every $F \in Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \vert\vert\cdot\vert\vert_{n, 2}\Big)$ and every $a > 0$ ,
holds.
3.3. Concentration toward the graphon system
Recalling the notation in (1.4), we now provide the concentration, in terms of the (1- and 2-) Wasserstein distance, of the empirical measures of the finite particle systems toward the averaged measure $\widetilde{\mu}_t$ of the graphon system. Proofs will be given in Section 4.3.
First, we have the following result on the concentration of $\bar{L}_{n, t}$ toward $\widetilde{\mu}_t$ in terms of the $W_1$ -distance, due to Theorem 3.1.
Theorem 3.3. Under Assumptions 2.1, 2.2, and 2.3(iii), there exist constants $\delta > 0$ , which is independent of n, and $N \in \mathbb{N}$ such that
holds for every $a > 0$ and every $n \ge N$ .
Remark 3.1. Theorem 3.3 gives the same exponential bound as in Theorem 2.1 of [Reference Bayraktar and Wu4]. The proof in [Reference Bayraktar and Wu4] mainly focuses on computing certain sub-Gaussian estimates, whereas our argument relies on the concentration property (3.2) of the system (1.3). Applying the same argument, we can even deduce the exponential bound in terms of the $W_2$ metric (in Theorem 3.5 below).
In Section 2.6, we introduced the concept of the $L^1$ -Fourier class, along with Bernstein’s inequality, to express $W_1(L_n, \bar{L}_n)$ in terms of $\Vert D^{(n)} \Vert_{\infty \rightarrow 1}$ . This gives rise to the following result on the concentration of the particle system (1.2) toward the graphon system.
Theorem 3.4. Suppose that the components of the interaction function $\phi$ belong to the $L^1$ -Fourier class (Definition 2.1). Under Assumptions 2.1, 2.2, 2.3(iii), and 2.4, there exist constants $\delta > 0$ , which is independent of n, and $N \in \mathbb{N}$ such that
holds for every $a > 0$ and every $n \ge N$ . For general interaction functions $\phi$ (which do not necessarily belong to the $L^1$ -Fourier class), we have instead
The following result gives the concentration of $\bar{L}_{n, t}$ toward $\widetilde{\mu}_t$ as in Theorem 3.3, but in terms of the $W_2$ metric. Its proof is similar to that of Theorem 3.3, but Theorem 3.2 is used in place of Theorem 3.1.
Theorem 3.5. Under Assumptions 2.1, 2.2, and 2.3, together with the condition (3.3), there exist constants $\delta > 0$ , independent of n, and $N \in \mathbb{N}$ such that
holds for every $a > 0$ and every $n \ge N$ .
Since we have the exponential bound in (3.8) in the $W_2$ metric, one naturally expects to obtain a bound similar to (3.6) in the $W_2$ metric as well. In order to do this, we need to find the exponential bound for the probability $\mathbb{P} \big[ \sup_{0 \le t \le T} W_2 (L_{n, t}, \bar{L}_{n, t}) > a \big]$ , which requires us to handle the quantity $\Vert (D^{(n)})^\top D^{(n)} \Vert_{\infty \rightarrow 1}$ , instead of $\Vert D^{(n)} \Vert_{\infty \rightarrow 1}$ as in the proof of Theorem 3.4. The control of this quantity is achieved in Lemma 3.1 under an extra condition on the sparsity parameter p(n), a more restrictive condition than the one in Assumption 2.4.
Assumption 3.1. The sparsity parameter sequence $\{p(n)\}_{n \in \mathbb{N}} \subset (0, 1]$ of the system (1.2) satisfies one of the following:
-
(i) $p(n) \rightarrow 0$ and $np(n)^2 \rightarrow \infty$ as $n \rightarrow \infty$ , or
-
(ii) $p(n) \equiv 1$ for every $n \in \mathbb{N}$ .
Recalling the notation of (2.5), we state the following lemma, which is needed in proving Theorem 3.6. Its proof, given in Section 4.3, is similar to that of Lemma 2.9, but requires more involved applications of Bernstein’s inequality.
Lemma 3.1. Under Assumption 3.1, there exists $N \in \mathbb{N}$ such that
holds for every $n \ge N$ and $\eta > 0$ .
Theorem 3.6. Suppose that the components of the interaction function $\phi$ belong to the $L^1$ -Fourier class. Under Assumptions 2.1, 2.2, 2.3, and 3.1(i), together with the condition (3.3), there exist constants $K > 0$ , which is independent of n, and $N \in \mathbb{N}$ such that
holds for every $a > 0$ and every $n \ge N$ .
Furthermore, if Assumption 3.1(i) is replaced by Assumption 3.1(ii), we have the exponential bound in n: there exist constants $\delta > 0$ , which is independent of n, and $N \in \mathbb{N}$ such that
holds for every $a > 0$ and every $n \ge N$ .
4. Proofs
In this section we provide the proofs of the results stated in Section 3.
4.1. Proofs of results in Section 3.1
4.1.1. Proof of Proposition 3.1
Let us recall the identity (4.7), along with the notation in (2.5). By Hölder’s inequality there exists $K > 0$ , depending on $\phi$ and $\psi$ , such that for every $t \in [0, T]$ ,
Taking the expectation of the first term, and using the independence of $\{D^{(n)}_{i, j}\}_{j \in [n]}$ and the boundedness of $\phi$ , we have
For the second term, Hölder’s inequality and the Lipschitz continuity of $\phi$ give
Combining above inequalities and averaging over $i \in [n]$ , we obtain
Grönwall’s inequality yields
and thus
4.1.2. Proof of Proposition 3.2
We divide the interval [0, T] into $M \;:\!=\; \lceil \frac{T}{\Delta} \rceil$ subintervals of length $\Delta > 0$ :
where $\Delta_h \;:\!=\; [(h-1)\Delta, \, h\Delta]$ for $h = 1, \cdots, M-1$ and $\Delta_M = [(M-1)\Delta, \, T]$ . (We choose the value of $\Delta$ later.) With the notation
the triangle inequality gives
For the first term, $E_1$ , we note that there exists $K>0$ , depending on the bounds of $\phi$ , $\psi$ , and $\sigma$ , such that
holds for every $0 \le u \le s \le T$ , and thus we have
Applying Hölder’s inequality twice, we find that the last expectation is bounded above by
The second-to-last inequality uses the properties of the increments of Brownian motion and the Burkholder–Davis–Gundy inequality with the positive constant $C_4$ . Therefore, we have the bound
For the second expectation, $E_2$ , a series of applications of Hölder’s inequality and Lemma 2.8 give
as $n \rightarrow \infty$ , where the last inequality follows from Lemma 2.2.
On the other hand, by the convexity of $W^2_2(\, \cdot \, , \, \cdot \, )$ and Lemma 2.3(ii), there exists $K > 0$ satisfying
Finally, for the last term, $E_4$ , we note from a straightforward computation that there exists $K > 0$ satisfying
for every $u \in [0, 1]$ and $s, t \in [0, T]$ satisfying $\vert t-s \vert \le 1$ . Thus, we have
Let us combine all the bounds from $E_1$ to $E_4$ . For any given $\epsilon > 0$ , we can choose $\Delta$ small enough so that $E_1 + E_4 < \epsilon/2$ . Then we can choose $N \in \mathbb{N}$ large enough so that $E_2 + E_3 < \epsilon/2$ for every $n \ge N$ , which implies $\mathbb{E} \big[ \sup_{0 \le t \le T} W_{2} (\widetilde{L}_{n, t}, \, \widetilde{\mu}_t) \big] < \epsilon$ for every $n \ge N$ .
4.2. Proofs of results in Section 3.2
4.2.1. Proof of Theorem 3.1
Let us fix an arbitrary $n \in \mathbb{N}$ . We shall naturally identify elements of $(\mathbb{R}^d)^n$ with those of $\mathbb{R}^{dn}$ , and elements of $\big( C([0, T] \;:\; \mathbb{R}^d)\big)^n$ with those of $C\big([0, T] \;:\; (\mathbb{R}^d)^n\big)$ ; we shall specify which norm we use for each space. We can express the SDE (1.3) in the form of (2.12) with $k = dn$ , by making the following definitions:
-
(i) $(\mathbb{R}^d)^n \ni x = (x_i)_{i \in [n]}$ , where $x_i = X_{i/n}(0)$ ;
-
(ii) $C\big([0, T] \;:\; (\mathbb{R}^d)^n\big) \ni X^x = \left(\bar{X}^n_i\right)_{i \in [n]}$ , where $\bar{X}^n_i = \left(\bar{X}^n_{i, k}\right)_{k \in [d]}$ ;
-
(iii) $b \;:\; [0, T] \times C\big([0, T] \;:\; (\mathbb{R}^d)^n\big) \rightarrow (\mathbb{R}^d)^n$ is such that $b = (b_i)_{i \in [n]}$ , $b_i = (b_{i, k})_{k \in [d]}$ , where
$$b_{i, k}(t, X^x) = \frac{1}{n} \sum_{j=1}^n \phi_k\big(\bar{X}^n_i(t), \bar{X}^n_j(t)\big)G\bigg(\frac{i}{n}, \frac{j}{n}\bigg) + \psi_k\big(\bar{X}^n_i(t)\big);$$ -
(iv) $W = (W_i)_{i \in [n]}$ is a (dn)-dimensional Brownian motion, where $W_i \equiv B_{i/n}$ ;
-
(v) $\Sigma$ is a block-diagonal $(dn) \times (dn)$ matrix with block diagonal entries $\sigma$ .
In order to apply Lemma 2.5, it suffices to check the condition (2.13): for any $X, Y \in C([0, T] \;:\; \mathbb{R}^{dn})$ , Hölder’s inequality and the Lipschitz continuity of $\phi$ , $\psi$ indeed yield, for every $t \in [0, T]$ ,
Let $P^x \in \mathcal{P}\big(C([0, T] \;:\; \mathbb{R}^{dn})\big)$ be the law of the solution of (1.3) in the notation of (Reference Aurell, Carmona and Laurière1)–(Reference Bayraktar, Wu and Zhang5) above; then, from Lemma 2.5, for any $Q \in \mathcal{P}\big(C([0, T] \;:\; \mathbb{R}^{dn})\big)$ we have
for some $c_1 > 0$ .
For an arbitrary $F \in Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \vert\vert\cdot\vert\vert_{n, 1}\Big)$ , Hölder’s inequality shows that F is a $\sqrt{n}$ -Lipschitz function on the space $\big(C([0, T] \;:\; \mathbb{R}^{dn}), \, \Vert \cdot \Vert_{dn, 2}\big)$ ; indeed, for $X, Y \in \Big(\big(C([0, T] : \mathbb{R}^d)\big)^n, \Vert \cdot \Vert_{n, 1}\Big)$ we obtain
Thus, Lemma 2.6 implies
for any $a > 0$ .
We now claim that there exists a positive constant $c_2$ , which does not depend on n, such that the map $x \mapsto \langle P^x, F \rangle$ is $c_2$ -Lipschitz on $(\mathbb{R}^d)^n$ with respect to the Euclidean $\ell^p$ norm for any $F \in Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \vert\vert\cdot\vert\vert_{n, p}\Big)$ and for any $p = 1, 2$ . Given any $x, y \in (\mathbb{R}^d)^n$ , we couple $P^x$ and $P^y$ by solving the system (1.3) from the two initial states x, y with the same Brownian motion, and denote the coupling by $\pi_{x, y}$ . We deduce that for $\mathcal{L}(X) = P^x$ and $\mathcal{L}(Y) = P^y$ ,
When $p = 2$ , we use a standard argument (the trivial inequality $(a+b)^2 \le 2(a^2+b^2)$ , the Lipschitz continuity from Assumption 2.1(i), and a series of applications of Hölder’s inequality) to derive
Grönwall’s inequality yields that the last integrand in (4.2) for $p=2$ is bounded by
for some constant $c_2 > 0$ , which depends on $\phi$ , $\psi$ , and T, but not on n. When $p=1$ , proving $\vert\vert X-Y \vert\vert_{n, 1} \le c_2 \vert\vert x-y \vert\vert_{n, 1}$ is easier, and the claim follows.
On the other hand, we apply Lemmas 2.7, 2.6 to the assumption (3.1) to obtain, for every $f \in Lip\big( (\mathbb{R}^d)^n, \, \vert\vert\cdot\vert\vert_{n, 1} \big)$ and for any $a > 0$ ,
We conclude from (4.1), the above claim, and (4.3) that
The assertion (3.4) follows from choosing $1 / \delta = 8 \max\!(c_1, \kappa c_2^2)$ .
4.2.2. Proof of Theorem 3.2
We follow the proof of Theorem 3.1. Identifying the elements of $\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n$ with those of $C([0, T] \;:\; \mathbb{R}^{dn})$ , expressing the SDE (1.3) in the form of (2.12), and applying Lemma 2.5, we have that there exists a positive constant $c_1 > 0$ such that
holds for any $Q \in \mathcal{P}\big(C([0, T] \;:\; \mathbb{R}^{dn})\big)$ . Here, $P^x$ is the law of the solution of (1.3). Moreover, Lemma 2.6 implies
for any $a > 0$ and every $F \in Lip\big(C([0, T] \;:\; \mathbb{R}^{dn}), \Vert\cdot\Vert_{dn, 2}\big)$ . It is easy to check that every function in $Lip\big(C([0, T] \;:\; \mathbb{R}^{dn}), \Vert\cdot\Vert_{dn, 2}\big)$ also belongs to $Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \Vert\cdot\Vert_{n, 2}\Big)$ ; thus the inequality (4.4) also holds for every $F \in Lip\Big(\big(C([0, T] \;:\; \mathbb{R}^d)\big)^n, \Vert\cdot\Vert_{n, 2}\Big)$ .
We now apply Lemmas 2.7(ii) and 2.6 to the assumption (3.3) to deduce
for every $f \in Lip\big( (\mathbb{R}^d)^n, \, \vert\vert\cdot\vert\vert_{n, 2} \big)$ and for any $a > 0$ .
From (4.4), (4.5), and the claim in the proof of Theorem 3.1, we conclude that
The result (3.4) follows from choosing $1 / \delta = 8 \max\!(c_1, \kappa c_2^2)$ .
4.3. Proofs of results in Section 3.3
4.3.1. Proof of Theorem 3.3
First, we claim that
is $(1/n)$ -Lipschitz from $\Big(\big( C ([0, T] \;:\; \mathbb{R}^d) \big)^n, \, \Vert\cdot\Vert_{n, 1} \Big)$ to $\mathbb{R}$ . For $Y, Z \in \big( C ([0, T] \;:\; \mathbb{R}^d) \big)^n$ , applying the triangle inequality for the Wasserstein metric and taking the supremum over $t \in [0, T]$ gives the inequality
From the definitions (2.1) and (2.2), the last expression is less than or equal to $\frac{1}{n} \sum_{i=1}^n \Vert Y_i - Z_i \Vert = \frac{1}{n}\Vert Y-Z \Vert_{n, 1}$ , and the claim follows.
Then, for any $a > 0$ ,
The first term is bounded by the right-hand side of (3.5) from Theorem 3.1.
Let us consider the auxiliary particle system (1.2) satisfying Assumption 2.4. Corollary 3.1 shows that the last probability vanishes for all but finitely many n, and the result follows.
4.3.2. Proof of Theorem 3.4
We first prove (3.6). From the triangle inequality, we obtain
In what follows, we compute the bound for the last probability on the right-hand side. For fixed $t \in [0, T]$ and $i \in [n]$ , we use the notation (2.5) to obtain
We define $\triangle(t) \;:\!=\; \frac{1}{n} \sum_{i=1}^n \vert \vert X^n_i - \bar{X}^n_i \vert \vert_{\star, t}$ , then deduce from the continuity of $X^n_i(\!\cdot\!)-\bar{X}^n_i(\!\cdot\!)$ that there exists $t_i \in [0, t]$ for each $i \in [n]$ satisfying
Since each component $\phi_k$ of $\phi$ belongs to the $L^1$ -Fourier class, there exists a finite complex measure $m_{\phi_k}$ such that we can write, for every $k \in [d]$ ,
for some complex functions $a^k_i, b^k_j$ of the form
Using the representation (4.11) with the elementary inequality
we find that the integral of (4.8) is bounded above by
where we define the complex vectors
Since the $\ell^{\infty}$ norms of these vectors are bounded by 1, decomposing them into real and complex parts gives, for each $k \in [d]$ ,
Thus, the right-hand side of (4.12) is bounded above by
For the integrals of (4.9) and (4.10), we use the Lipschitz continuity of $\phi$ and $\psi$ ; thus, there exists a constant $K > 0$ such that
Grönwall’s inequality yields
where K is now a positive constant depending on the time horizon T. Recalling the notation $\triangle(t)$ , we obtain
and finally Lemma 2.9 gives the bound for the last probability of (4.6),
For the first probability on the right-hand side of (4.6), Theorem 3.3 yields, for every $n \ge N$ ,
Thanks to Assumption 2.4, by choosing a larger value for $N \in \mathbb{N}$ than the one in Theorem 3.3, we can ensure that
for every $n \ge N$ , and the assertion (3.6) follows.
For the result (3.7), we can approximate a general $\phi$ by those in the $L^1$ -Fourier class, using the approximation method in Section 5.1.3 of [Reference Oliveira and Reis27], to find the exponential bound for the probability $\mathbb{P} [\!\sup_{0 \le t \le T} d_{BL} (L_{n, t}, \bar{L}_{n, t}) > a/2 ]$ (similar to (4.16)). By recalling the fact $d_{BL} \le W_1$ and replacing all the $W_1$ metrics with the $d_{BL}$ metrics in (4.6), we arrive at (3.7).
4.3.3. Proof of Theorem 3.5
As in the proof of Theorem 3.3, for $Y, Z \in \big( C ([0, T] \;:\; \mathbb{R}^d) \big)^n$ we derive
again from (2.1) and (2.2). This verifies the $\Big(\frac{1}{\sqrt{n}}\Big)$ -Lipschitz continuity of the map
from $\Big(\big( C ([0, T] \;:\; \mathbb{R}^d) \big)^n, \, \Vert\cdot\Vert_{n, 2} \Big)$ to $\mathbb{R}$ . Then, for any $a > 0$ ,
The first term is bounded by the right-hand side of (3.4) from Theorem 3.2. The last probability vanishes for all but finitely many n from Corollary 3.1.
4.3.4. Proof of Lemma 3.1
We note from (2.5) that $\Big\{D^{(n)}_{i, j}\Big\}_{1 \le i, j \le n}$ are independent zero-mean random variables, and for every $i, j \in [n]$ ,
In particular, since $p(n) \le 1$ , we have $0 \le p(n)G(\frac{i}{n}, \frac{j}{n}) \le 1$ , and thus $\mathbb{E}\big[(D^{(n)}_{i, j})^2\big] \le 1/(4n^2p(n)^2)$ .
Let us fix any $n \in \mathbb{N}$ . For arbitrary n-dimensional vectors $\mathbf{x}, \, \mathbf{y} \in [\!-\!1, 1]^n$ , we have
Thus, for fixed arbitrary $\eta > 0$ we have
From Assumption 3.1, there exists $N \in \mathbb{N}$ such that $P_3$ vanishes for every $n \ge N$ . In the following, we find the bounds for $P_1$ and $P_2$ . Using the distribution
for each $i, j \in [n]$ , we derive for $P_1$
The summands $D^{(n)}_{i, k} x_j y_k$ in the last two probabilities are independent zero-mean random variables bounded above by $1/(np(n))$ , bounded below by $-1/n$ , and satisfying
from (4.17). From Bernstein’s inequality (Lemma 2.10), we have
thus
We now compute the bound for $P_2$ . We have
and the summands $\big( D^{(n)}_{i, j} \big)^2 x_j y_j - \mathbb{E}\big[ \big(D^{(n)}_{i, j}\big)^2 x_j y_j \big]$ in the probability are independent zero-mean random variables bounded above by $5/(2np(n))^2$ . Moreover, we easily obtain the bound $\mathbb{E}\big[ (D^{(n)}_{i, j})^4\big] \le 1/(np(n))^4$ , and thus the sum of the variances of the summands are
Applying Bernstein’s inequality (Lemma 2.10) to each probability in (4.20) yields
Comparing the bounds of (4.19) and (4.21), modifying the value of N if necessary, and plugging these into (4.18), we obtain the result.
4.3.5. Proof of Theorem 3.6
The argument is similar to the proof of Theorem 3.4. The triangle inequality gives
Recalling the identity (4.7), applying Hölder’s inequality several times, and using the Lipschitz property, we obtain
For a fixed $t \in [0, T]$ , by the continuity of $X^n_i(\!\cdot\!)-\bar{X}^n_i(\!\cdot\!)$ , there exists $t_i \in [0, t]$ for each $i \in [n]$ satisfying $\Box(t) \;:\!=\; \frac{1}{n} \sum_{i=1}^n \Vert X^n_i - \bar{X}^n_i \Vert^2_{\star, t} = \frac{1}{n} \sum_{i=1}^n \big\vert X^n_i(t_i) - \bar{X}^n_i(t_i) \big\vert^2$ . Combining this with the last inequality, we have
We recall the representations (4.11) and (4.13) and use Hölder’s inequality to derive for the first integral on the right-hand side
In the last two inequalities, we used the fact that the $\ell^{\infty}$ norms of the two vectors $\mathbf{a}^k(z, s)$ and $\mathbf{b}^k(z, s)$ are bounded by 1. Thus, from (4.23), there exists a constant $K>0$ such that
holds for every $t \in [0, T]$ , and applying Grönwall’s inequality gives
where the constant K now depends on T.
Since we have
Lemma 3.1 shows that there exists $N \in \mathbb{N}$ such that the last probability in (4.22) has the bound
for every $n \ge N$ .
On the other hand, Theorem 3.5 gives the bound for the other probability in (4.22). By comparing the two bounds under Assumption 3.1(i), we obtain the assertion (3.9). The result (3.10) is now clear under Assumption 3.1(ii), if we set $p(n) \equiv 1$ and redefine the constants $\delta > 0$ and $N \in \mathbb{N}$ appropriately.
Funding information
E. Bayraktar is supported in part by the National Science Foundation under the grant DMS-2106556 and by the Susan M. Smith Professorship.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.