1. Introduction
Since McKean’s seminal paper [Reference McKean51], mean-field theory has been widely used to study large stochastic interacting particle systems arising in various domains such as statistical physics [Reference Dawson21, Reference Gärtner37, Reference McKean51, Reference McKean52], biological systems [Reference Dawson23, Reference Méléard and Bansaye53], communication networks [Reference Benaïm and Le Boudec8, Reference Graham39, Reference Graham and Méléard41, Reference Graham and Méléard42], mathematical finance [Reference Giesecke, Spiliopoulos, Sowers and Sirignano38, Reference Kley, Klüppelberg and Reichel47], etc. This theory, first initiated in connection with the mathematical foundation of the Boltzmann equation, aims for a mathematically rigorous treatment of the time evolution of stochastic systems with weak long-range interaction where the interaction between the particles is realized via the empirical measure of the particle configuration. For such scenarios, it is then natural to investigate the behavior of the empirical process instead of considering the particle configuration itself. In particular, one is interested in investigating laws of large numbers, limit theorems, and large deviations for the empirical process in the limit as the number of particles tends to infinity. Another concept that plays an important role is chaos propagation, first introduced by Kac as part of kinetic theory [Reference Kac45], and then widely studied in the literature since; see, e.g., [Reference Gärtner37] and [Reference Sznitman61] for detailed developments on the subject.
Classically, the systems studied are homogeneous with complete interaction graphs; that is, the particles are exchangeable and each particle interacts with all the others. In such a setting, the big picture is well understood and various asymptotic results have been established for a variety of models. One can consult [Reference Dawson22], [Reference Gärtner37], or [Reference Sznitman61] for an overview. However, though such assumptions are reasonable in statistical physics and accurately describe a variety of phenomena, this may no longer be the case when considering other applications. Researchers have therefore studied many new interacting particle systems where the homogeneity or complete interaction assumptions are not tenable; this is the direction in which this paper proceeds.
Systems in which particles carry intrinsic distinguishing features lead naturally to heterogeneous models. Thus, one cannot presume the particles to be identically distributed. Instead, one relies on additional conditions to establish limiting results. For instance, in [Reference Finnoff34, Reference Finnoff35, Reference Giesecke, Spiliopoulos, Sowers and Sirignano38], models for the activities of heterogeneous economic agents were proposed and laws of large numbers were proved under some regularity conditions. In [Reference Chong and Klüppelberg18], the authors investigated systems of interacting stochastic differential equations with two kinds of heterogeneity: one originating from different weights of the linkages, and the other concerning their asymptotic relevance when the system becomes large. The authors then introduced a partial mean-field system by averaging only over the particles with weak interactions and proved a law of large numbers together with a large deviations principle.
A particular instance of heterogeneity, with which our current work is in line, is the multi-population paradigm, also known as multi-types or multi-species. Here, the particles of the system are divided into groups, within which they are homogeneous or partially homogeneous. The interest in these structures is motivated by their ubiquity in different fields (see references below). We thus propose in this paper to take a step forward in the understanding of their behavior by studying the large-scale asymptotics of interacting particle systems with jumps on block-structured networks. Specifically, we will set up a model for block-structured networks with dynamically changing multi-color nodes. The evolution of node colors is described by a sequence of finite-state pure-jump processes interacting through local empirical measures describing the neighborhood of each node. The nodes of the network are divided into a finite number of blocks. In addition, the nodes within each block are divided into two subgroups: central and peripheral nodes. The central nodes are those connected only to nodes from the same block, whereas the peripheral nodes interact with both particles from the same block and some particles from other blocks. Thus, our model describes two levels of heterogeneity: between blocks and within blocks.
Block-structured networks are ubiquitous in various interacting particle systems composed of different communities, where a given community consists of a group of agents densely connected to each other but sparsely connected to the other dense groups of the population. For instance, a community in a social network might refer to a circle of friends, a community on the World Wide Web might include a group of pages on closely related topics, and a community in a cellular or genetic network might be related to a functional module. The study of the grouping patterns of communities, together with their detection, is an active field of research among physicists and applied mathematicians, and the study of what has become known as community structure is now one of the most prominent areas of network science. The reader can consult [Reference Porter, Onnela and Mucha59] and the references therein for an overview of the subject.
Our idea builds on the results in several existing works. The authors of [Reference Collet19, Reference Collet, Formentin and Tovazzi20] studied a bi-populated Curie–Weiss model and established, via a large deviations approach, the propagation of chaos and the asymptotic dynamics of the pair of group magnetizations in the infinite-volume limit. Laws of large numbers and a central limit theorem were proved in [Reference Kirsch and Toth46] for an extension of this model to the case of heterogeneous coupling within and between groups. A related paper [Reference Löwe and Schubert50] studied high-temperature fluctuations for a block spin Ising model and established a central limit theorem. A variant of this model was analyzed in [Reference Knöpfel, Löwe, Schubert and Sinulis48], where the vertices are divided into a finite number of blocks and pair interactions are given according to their blocks. The authors proved a large deviations principle and a central limit theorem. In the same spirit, limiting results were established in [Reference Nagasawat and Tanaka56] for a system of reflected diffusions segregated into two groups of blue and red particles and subject to a reflection condition. These results were extended in [Reference Nagasawat and Tanaka57] to the case with drift coefficients not of average form. Other recent related works are [Reference Meylahn54], which studied the two-community noisy Kuramoto model, and [Reference Aleandri and Minelli4], which studied opinion dynamics in a model with Lotka–Volterra-type interactions. Among other instances of the multi-population paradigm, we mention in particular the works [Reference Buckdahn, Li and Peng14, Reference Carmona and Zhu17], which considered mean-field game models with a single major player and statistically identical minor players. Propagation of chaos was proved for the minor players conditioned on the major player.
Another closely related model was proposed in [Reference Bayraktar and Wu7], where systems of weakly interacting jump processes on time-varying random graphs with dynamically changing multi-color edges were studied. In [Reference Bayraktar and Wu7], the dynamics of a node depend on the joint empirical distribution of all other nodes and the edges to which it connects. In contrast, the dynamics of an edge depend only on the corresponding nodes to which it connects. The paper [Reference Bayraktar and Wu7] established the law of large numbers, propagation of chaos, and central limit theorems for these systems. Despite certain similarities, the class of models which we are considering in the current paper differs in several aspects from the models contemplated in [Reference Bayraktar and Wu7]. First, the interacting particle systems that we study are on static block-structured graphs, whereas the ones considered in [Reference Bayraktar and Wu7] are on time-varying random graphs with edge-structure dynamics. Moreover, in the current work, we consider a multi-population setting where the interaction between the nodes is local, i.e. each node interacts only with its neighbors, whereas in [Reference Bayraktar and Wu7] the interaction between nodes is global, since the dynamic of a given node depends on the empirical distribution of all the other nodes. Finally, the analysis carried out and the results obtained in our current work are established on the vector of local empirical measures adapted to the multi-population context, and thus on product spaces, which allows us to overcome the heterogeneity due to the block structure of the graph. Furthermore, note that the current paper addresses the topic of interacting particle systems on large (random) networks, which has attracted increasing attention in recent years; see, e.g., [Reference Bayraktar, Chakraborty and Wu6, Reference Bayraktar and Wu7, Reference Bhamidi, Budhiraja and Wu9] and the references therein.
Alongside the papers listed above, the multi-population framework has also been considered for systems of interacting diffusions. We mention for instance [Reference Kley, Klüppelberg and Reichel47] for an analysis of a system of interacting Ornstein–Uhlenbeck processes on a heterogeneous network of credit-interlinked agents, [Reference Bossy, Faugeras and Talay13, Reference Budhiraja and Wu16, Reference Touboul62] and the references therein for studies of neuronal networks composed of separate populations, or [Reference Nguyen, Nguyen and Du58] and the references therein for mean-field multi-class interacting diffusions models in a general setting. (Note that some erroneous results that were originally stated in [Reference Touboul62] were corrected in [Reference Touboul63].)
The goal of the current work is to develop limiting results for interacting finite-state pure-jump processes on a class of block-structured networks. Our first main result, Theorem 5.1, and its consequence Corollary 5.1 give propagation of chaos and a law of large numbers under some regularity conditions on the degrees of the nodes. We show that in the mean-field limit, the asymptotic behavior of the node colors can be represented by the solution of a McKean–Vlasov system. Because of the lack of symmetry, we make use of the extension of the notion of chaoticity and Sznitman coupling methods to multi-class systems developed in [Reference Graham40, Reference Graham and Robert43]. The existence and uniqueness results for the limiting system are established in Theorem 4.1. The regularity conditions which we impose (cf. Condition 4.1) can be compared to the uniform degree property introduced in [Reference Delattre, Giacomin and Luçon28] for a model of interacting diffusions on random graphs and to the one introduced in [Reference Budhiraja, Mukherjee and Wu15] for a model of interacting pure-jump processes on sparse graphs.
Another aspect which we are interested in is the large deviations properties of the system. For this purpose, with the aim of simplicity, we will restrict ourselves to the case where the blocks are cliques and the peripheral subgraph is complete, that is, the case where all peripheral nodes of the system are connected and all the central nodes within the same block are connected. We then state our next main results in Theorem 6.1, which establishes the large deviations principle for the empirical measure vector over finite time duration, followed by Theorem 6.2, which gives the large deviations principle for the empirical process vector. These results generalize those of [Reference Borkar and Sundaresan12, Reference Léonard49] to the multi-population context. Also, unlike [Reference Léonard49] and similarly to [Reference Borkar and Sundaresan12], we do not impose chaotic initial conditions, but only converging initial conditions. The proofs of the large deviations principles, which provide tools for handling the technicalities arising from the multi-population context, generalize the classical approach developed in [Reference Dawson and Gärtner24] and its adaptation to the context of jump processes in [Reference Léonard49].
In summary, the current work is a contribution to the multi-population paradigm and a move towards heterogeneity for mean-field models and their large deviations behavior. The rest of this paper is organized as follows. The detailed model for interacting finite-state pure-jump processes on block-structured graphs is introduced in Section 2. Section 3 provides some practical examples of applications of the class of models studied in this paper. In Section 4, we introduce the McKean–Vlasov limiting system, and we prove the existence and uniqueness of its solution under specific regularity conditions introduced in Condition 4.1. Then, under the same conditions, in Section 5 we prove propagation of chaos (Theorem 5.1) and the law of large numbers (Corollary 5.1). Next, in Section 6, we present the large deviations principles for the empirical measure vector (Theorem 6.1) and for the empirical process vector (Theorem 6.2).
2. Formulation of the model
This section introduces the model and related notation.
2.1. The setting
A block-structured network:
Consider an undirected block-structured graph $\mathcal{G}=(\mathcal{V},\Xi)$ , where $\mathcal{V}$ is the set of nodes and $\Xi$ is the set of edges. The set $\mathcal{V}$ is partitioned into r (finite) blocks $C_1,\ldots,C_r$ of sizes $N_1,\ldots,N_r$ , respectively. Denote by $|\mathcal{V}|\,:\!=\,N_1+\cdots+N_r=N$ the cardinality of the set $\mathcal{V}$ , which corresponds to the total number of nodes in the network.
The nodes of each block $C_j$ are divided into two categories:
Central nodes $C^c_j$ are connected to some nodes from the same block but not to any nodes from any other blocks. We set $|C_j^c|=N_j^c$ .
Peripheral nodes $C^p_j$ are connected to some nodes from the same block and some nodes from other blocks. We set $|C^p_j|=N^p_j$ .
Multi-color nodes:
Let $\mathcal{Z}\,:\!=\,\{1,2,\ldots,K\}\subset\mathbb{N}$ be a set of K colors. Suppose that each node of the graph $\mathcal{G}=(\mathcal{V},\Xi)$ is colored by one of these colors at each time. One can associate each node to a particle whose state space is $\mathcal{Z}$ . Thus, we will use the denominations ‘node’ and ‘particle’ interchangeably to refer to the same thing. Denote by $(\mathcal{Z},\mathcal{E})$ the directed graph where $\mathcal{E}\subset\mathcal{Z}\times\mathcal{Z}\backslash \{(z,z)| z \in\mathcal{Z}\}$ describes the set of admissible jumps for each particle. Moreover, whenever $(z,z')\in\mathcal{E}$ , a particle colored by z is allowed to move from z to z ′ at a rate that depends on the current state of the node and the state of its neighbors (adjacent nodes).
For each $1\leq j\leq r$ and $n\in C_j^c$ (resp. $n\in C_j^p$ ), let us define by $(X^c_{n,j}(t),t\geq 0)$ (resp. $(X^p_{n,j}(t),t\geq 0)$ ) the stochastic process that describes the state (color) of the central (resp. peripheral) node n at time t. In addition, we denote by $\mu^N_j(t)$ the local empirical measure describing the state of the jth block at time t, which is given by
where $\mu_j^{c,N}(t)=\frac{1}{N_j^c}\sum_{n\in C^c_j}\delta_{X^c_{n,j}(t)}$ $\Big($ resp. $\mu_j^{p,N}(t)=\frac{1}{N_j^p}\sum_{n\in C^p_j}\delta_{X^p_{n,j}(t)}\Big)$ is the empirical measure describing the state of the central (resp. peripheral) nodes of the jth block at time t. The fractions $\frac{N_j^c}{N_j}$ (resp. $\frac{N_j^p}{N_j}$ ) thus represent the proportion of central (resp. peripheral) nodes in the block j. Denote by $\mathcal{M}_1(\mathcal{Z})$ the set of all probability measures over $\mathcal{Z}$ , endowed with the topology of weak convergence.
The random dynamics:
The process $X(t)=\Big(X^c_{n,j}(t), X^p_{m,j}(t),n\in C_j^c, m\in C_j^p, 1\leq j\leq r\Big)$ describing the evolution of the entire system is a continuous-time Markov chain with state space $\mathcal{Z}^N$ . The transition rate of each node depends on its current state and the state of its neighbors, together with the block to which it belongs. To characterize these neighborhoods, we introduce a set of local empirical measures describing the state of the star-shaped subgraph centered at each node n and composed of the nodes connected to it. To lighten the formulas and for ease of reading, we introduce the following shorthand notation: for any two nodes $n,m\in\mathcal{V}$ , $m\sim n$ means that $\{m,n\}\in\Xi$ . Moreover, for any block $1\leq j\leq r$ and $\iota\in\{c,p\}$ , we denote by $\mathfrak{N}_j^{\iota}(n)\,:\!=\,\{n'\in C_j^{\iota}\,:\,n\sim n'\}$ the set of nodes in $C_j^{\iota}$ that are connected to n. Let deg(n) denote the degree of the node n, and let $M_i^{\iota}(n)\,:\!=\,|\mathfrak{N}_i^{\iota}(n)|$ for $\iota\in\{c,p\}$ and $1\leq i\leq r$ be the cardinality of the set $\mathfrak{N}_i^{\iota}(n)$ . Thus, one notices that for $n\in C_j^c$ , $deg(n)=M_{j}^{c}(n)+M_{j}^{p}(n)$ , and for $n\in C_j^p$ , $deg(n)=M_{j}^{c}(n)+\sum_{k=1}^r M_k^{p}(n)$ .
Now, for any $n\in C_j^c$ and $1\leq j\leq r$ , let us define
and
and finally
Equivalently, for any $n\in C_j^p$ , $1\leq j\leq r$ , and $j'\neq j$ , define
and
and finally
Therefore, the random dynamic in each block $1\leq j\leq r$ is described as follows:
The central nodes dynamic. For each central node $n\in C^c_j$ , its color $X^c_{n,j}(t)$ goes from z to z ′, for $(z,z')\in (\mathcal{Z},\mathcal{E})$ , at rate
(8) \begin{align} \lambda_{j,z,z'}^{c}\bigg(\aleph^c_{n,j}(t),\aleph^p_{n,j}(t),\varrho^c_{n,j},\varrho^p_{n,j}\bigg),\end{align}which depends on its current state and on the states of its neighbors through the functions $ \lambda_{j,z,z'}^{c}\,:\,\mathcal{M}_1(\mathcal{Z})\times\mathcal{M}_1(\mathcal{Z})\times [0,1]\times [0,1]\rightarrow\mathbb{R}_+$ .The peripheral nodes dynamic. For each peripheral node $n\in C^p_j$ , its color $X^p_{n,j}(t)$ goes from z to z ′, for $(z,z')\in (\mathcal{Z},\mathcal{E})$ , at rate
(9) \begin{equation}\begin{split}\lambda^p_{j,z,z'}\bigg(\beth^c_{n,j}(t),\beth^p_{n,j,1}(t),\ldots,\beth^p_{n,j,r}(t),\varsigma^c_{n,j},\varsigma^p_{n,j,1},\ldots,\varsigma^p_{n,j,r}\bigg),\end{split}\end{equation}which also depends on its state and the states of its neighbors through the functions $ \lambda_{j,z,z'}^{p}\,:\,\big(\mathcal{M}_1(\mathcal{Z})\big)^{r+1}\times [0,1]^{r+1}\rightarrow\mathbb{R}_+$ .
The explicit forms of the rate functions will be introduced in Condition 4.1. To avoid cluttering our notation, let us introduce the following vectors:
Thus, we will write $ \lambda_{j,z,z^{\prime}}^{c}\Big(\upsilon_{n,j}^{c,N}(t)\Big)$ instead of (8) and $\lambda^p_{j,z,z'}\Big(\upsilon_{n,j}^{p,N}(t)\Big)$ instead of (9).
Remark 2.1. One can see the model under investigation as a multi-species system where each block $C_j$ represents a separate species. In particular, the rate functions $\lambda_{j,z,z'}^{c}$ and $\lambda_{j,z,z'}^{p}$ being block-dependent, the dynamic of each particle depends on its species, i.e., the block to which it belongs. This idea has been extensively used in the literature on multi-type systems; see, e.g., [Reference Agliari, Migliozzi and Tantari1, Reference Alberici, Camilli, Contucci and Mingione3, Reference Barra, Contucci, Mingione and Tantari5, Reference Budhiraja and Wu16] and the references therein. The specificity here is the existence of heterogeneity even across particles of the same species/block. Indeed, the central/peripheral paradigm creates two sub-types of particles within the same species whose rate functions differ. This construction appears to be natural in certain multi-group systems where only a few particles from the different groups interact; detailed examples are given below. Also, the interaction structure differs even for the central (resp. peripheral) particles of the same species/block, given that the rate functions depend on the node-centered local empirical measures, which differ even within the same block.
2.2. The infinitesimal generator
For any $T\in (0,+\infty)$ , the processes $X^c_{n,j}\,:\,[0,T]\rightarrow\mathcal{Z}$ for $n\in C_j^c$ and $X^p_{m,j}\,:\, [0,T] \rightarrow\mathcal{Z}$ for $m\in C_j^p$ , which respectively describe the evolution of the central and the peripheral particles over the time interval [0,T], are càdlàg paths, and thus are elements of the Skorokhod space $\mathcal{D}([0,T],\mathcal{Z})$ equipped with the Skorokhod topology. Let $X^N=\big(X^c_{n,j},X^p_{m,j},n\in C_j^c,m\in C_j^p, 1\leq j\leq r\big)\in\mathcal{D}([0,T],\mathcal{Z}^N)$ denote the full path description of all N particles. Thus the process $X^N$ is a Markov process with càdlàg paths, with state space $\mathcal{Z}^N$ , and with the infinitesimal generator $\mathcal{L}^N$ acting on the bounded measurable functions $\phi$ on $\mathcal{Z}^N$ according to
where $x^N=\big(x_{n,j},x_{m,j},n\in C_j^c,m\in C_j^p, 1\leq j\leq r\big)\in\mathcal{Z}^N$ and $x^N_{n,z,z'}$ describes the new configuration of the system when the state of the nth node has changed from z to z ′.
2.3. Stochastic differential equation representation
Recall that, for each central node $n\in C^c_j$ (resp. peripheral node $n\in C^p_j$ ) from a given block $1\leq j\leq r$ , the evolution of its color is described by the continuous-time stochastic process $(X_{n,j}^{c}(t),t\geq 0)$ $\big($ resp. $\big(X_{n,j}^{p}(t),t\geq 0\big)\big)$ that takes values in the finite state space $\mathcal{Z}$ , and whose dynamic is given by the time-dependent transition rate matrix $\left(\lambda_{j,z,z'}^{c}(\upsilon_{n,j}^{c,N}(t))\right)_{(z,z')\in\mathcal{E}}$ (resp. $\left(\lambda^p_{j,z,z'}(\upsilon_{n,j}^{p,N}(t))\right)_{(z,z')\in\mathcal{E}}$ ). Therefore, using a classical approach (see e.g. [Reference Skorokhod60, p. 104]), the processes $X_{n,j}^{c}$ and $X_{n,j}^{p}$ can be represented, at least weakly, by the following system of stochastic differential equations:
where $\{\mathcal{N}_{n,j}^c, n\in C_j^c, 1\leq j\leq r\}$ and $\{\mathcal{N}_{n,j}^p, n\in C_j^p, 1\leq j\leq r\}$ are collections of Poisson random measures on $\mathbb{R}^2$ whose intensity measures are Lebesgue on $\mathbb{R}^2_+$ . We will use the representation (12) in the analysis of the asymptotic behavior of the system when the total number of nodes N goes to infinity.
3. Examples
As mentioned in the introduction, mean-field block models have been proposed to investigate various phenomena arising in fields such as physics, engineering, biology, etc. This section presents some examples of applications of the model analyzed in the current paper, with the goal being to illustrate its usefulness and its flexibility to capture various phenomena. Of course, it remains a toy model that should be appropriately adapted to different applications, but we believe that the insights from the current study are of great interest for both theoretical and practical purposes.
3.1. Load-balancing networks
Load-balancing protocols are often used in queueing networks to improve system performance by shortening the queue length, reducing the waiting time, and increasing the system throughput. In this regard, the mean-field approach has been proven to be useful; see, e.g., [Reference Mitzenmather55, Reference Vvedenskaya, Dobrushin and Karpelevich65, Reference Vvedenskaya and Suhov66]. In particular, interesting work in this direction was proposed in [Reference Dawson, Tang and Zhao26], where the authors considered a queueing network with N nodes in which queue lengths are balanced through mean-field interactions using an interaction function. Here we summarize their model and then describe how our current model can be used to generalize the ideas in [Reference Dawson, Tang and Zhao26].
Consider a system consisting of N queues with a mean-field interaction. At $t = 0$ , for $1 \leq n \leq N$ , the arrival rate to the nth queue occurs according to $\zeta_{X_n(0)}$ , and the service rate at queue n is $\vartheta_{X_n(0)}$ . Let $h(x)\,:\,\mathbb{R}_+\times\mathbb{R}_+\rightarrow \mathbb{R}$ be a continuous nondecreasing interaction function satisfying certain regularity conditions (see [Reference Dawson, Tang and Zhao26, p. 339]). This function makes it possible to capture the mean-field interaction between queues as follows: for each queue $n= 1, 2,\ldots,N$ , the arrival rate at time t is given by $\zeta_{X_j(t)}- h(X_j (t),\langle \mu^N (t)(dx),x\rangle)$ , where $\mu^N (t)\,:\!=\,\frac{1}{N}\sum_{j=1}^N\delta_{X_j(t)}$ is the empirical measure corresponding to the N queues at time t, and $\langle \mu^N (t)(dx),x\rangle=\frac{1}{N}\sum_{j=1}^NX_j(t)$ is the mean queue length of the N queues at time t. Roughly speaking, the arrival rate at each queue depends on the current size of the queue and on the mean size of its neighbors (which is the entire set of queues in this case). The authors of [Reference Dawson, Tang and Zhao26] studied the performance of this system when the number of queues N goes to infinity.
The model proposed in the current paper can be seen as a generalization of the model in [Reference Dawson, Tang and Zhao26] to heterogeneous queueing networks, namely, to block-structured networks. To see this, let us consider the graph $\mathcal{G}=(\mathcal{V},\Xi)$ as a queueing network where the particles (nodes) are finite-buffer server queues of maximum size K (arbitrarily large), and the corresponding states $\Big(X^c_{n,j}(t), X^p_{m,j}(t),n\in C_j^c,m\in C_j^p,1\leq j\leq r, t\geq 0\Big)$ represent the number of customers waiting in each queue at each time t. Again, at $t = 0$ , for $1 \leq n \leq N$ , the arrival rate to the nth queue occurs according to $\zeta_{X^{\iota}_{n,j}(0)}$ , and the service rate at queue n is $\vartheta_{X^{\iota}_{n,j}(0)}$ , for $\iota\in\{c,p\}$ . Since the network now is heterogeneous, the mean-field interaction is local. Thus, the arrival rate at a central node queue $n\in C_j^c$ at time t is given by $\zeta^c_{X^c_{n,j}(t)}-h^c\Big(X^c_{n,j}(t),\Big\langle \mu^{c,N}_{n,j} (t)(dx),x\Big\rangle\Big)$ , whereas the arrival rate at a peripheral node queue $n\in C_j^p$ at time t is given by $\zeta^p_{X^p_{n,j}(t)}-h^p\Big(X^p_{n,j}(t),\Big\langle \mu_{n,j}^{p,N} (t)(dx),x\Big\rangle\Big)$ , where $\mu^{c,N}_{n,j} (t)$ and $\mu_{n,j}^{p,N} (t)$ are the local empirical measures respectively given by (4) and (7). The service rates $\vartheta^c_{X^c_{n,j}(t)}$ and $\vartheta^p_{X^p_{n,j}(t)}$ depend only on the queue sizes $X^c_{n,j}(t)$ and $X^p_{n,j}(t)$ at time t. Hence, the transition rates $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ are specified as follows:
The size $X^c_{n,j}(t)$ of each central queue $n\in C^c_j$ at time t goes from z to z ′ at rate
\begin{align*}\lambda_{j,z,z'}^{c}\,:\!=\,\left\{\begin{array}{l@{\quad}l} \zeta^c_{X^c_{n,j}(t)}-h^c\bigg(X^c_{n,j}(t),\frac{1}{deg(n)}\sum\limits_{\iota\in\{c,p\}}\sum\limits_{\substack{m\in \mathfrak{N}^{\iota}_j(n)}} X^{\iota}_{m,j}(t)\bigg) &{\text{if}\ z'=z+1\ \text{and}\ z'\leq K },\\[3pt] \vartheta^c_{X^c_{n,j}(t)} & {\text{if}\ z'=z-1\ \text{and}\ X^c_{n,j}(t)\geq 1 },\\[11pt] -\sum\limits_{y\neq z}\lambda^c_{j,z,y} & {\text{if}\ z'=z},\\[3pt] 0&\text{otherwise.} \end{array}\right.\end{align*}The size $X^p_{n,j}(t)$ of each peripheral queue $n\in C^p_j$ at time t goes from z to z ′ at rate
\begin{align*}\lambda^p_{j,z,z'}\,:\!=\,\left\{\begin{array}{l@{\quad}l} \zeta^p_{X^p_{n,j}(t)}-h^p\Bigg(X^p_{n,j}(t),\frac{1}{deg(n)}\sum\limits_{\substack{1\leq k\leq r\\ \iota\in\{c,p\}}}\sum\limits_{\substack{m\in\mathfrak{N}_k^{\iota}(n)}}(X^{\iota}_{m,k}(t))\Bigg) &{\text{if}\ z'=z+1\ \text{and}\ z'\leq K },\\ \vartheta^p_{X^p_{n,j}(t)} & {\text{if}\ z'=z-1\ \text{and}\ X^p_{n,j}(t)\geq 1 },\\[12pt] -\sum\limits_{y\neq z}\lambda^p_{j,z,y} & {\text{if}\ z'=z},\\ 0& \text{otherwise}. \end{array}\right.\end{align*}
It is worth mentioning that the sparse graph topologies have been considered in applications in response to some issues encountered when implementing load-balancing protocols. In particular, many service systems are geographically constrained; therefore, when a task arrives at any specific server, it may be impossible to collect instantaneous state information from all the servers. In addition, executing a task commonly involves the use of some data, and storing such data for all possible tasks on all servers requires an excessive amount of storage capacity. The use of sparser graph topologies is then considered, such that tasks that arrive at a specific server can only be forwarded, following a specific load-balancing scheme, to the servers that possess the data required to process the tasks. In other words, a specific server can only interact with its neighbors in a suitable sparse topology; see, e.g., [Reference Budhiraja, Mukherjee and Wu15] and the references therein for more insights about the subject. The block-structured topology with the central/peripheral paradigm can for instance be considered to overcome the geographic constraint by allowing central nodes to rely only on the information collected locally on nodes from the same block, which may represent nodes within the same geographic area, while the peripheral nodes are those relying on information from both within and outside the block. To increase system efficiency, one could restrict the number of peripheral nodes allowed. The results obtained in the current work allow us to understand the behavior of such systems when the number N of servers of the network is very large. In particular, the multi-chaotic property established in Theorem 5.1 tells us that the queue lengths at any finite collection of tagged servers are statistically asymptotically independent, and the queue-length process for each server converges in distribution to the corresponding McKean–Vlasov process given by (14). Also, Condition 4.1 and Remark 4.1 tell us that the multi-chaoticity result holds even when the peripheral subgraph is not complete, which means that one can achieve similar asymptotic performance even with far fewer connections between the peripheral nodes than when all the peripheral nodes are connected and all the central nodes of the same block are connected.
3.2. Multi-population SIS epidemics
The susceptible–infected–susceptible (SIS) model, originally used in epidemiology, is also convenient to model the spread of information in networks, since the two phenomena are similar. The SIS model can be summarized as follows: consider a piece of information, or an infectious disease, that propagates across a population. A member that has a copy of the information/disease is said to be infected, and a member that does not have a copy of the information/disease is said to be susceptible. When an infected member comes into contact with a susceptible one, the former transmits a copy of the information (disease) to the latter, which gets infected. Moreover, an infected member may spontaneously get rid of the information/disease, a phenomenon called curing, and become susceptible again.
In both epidemiology and network information diffusion, the population often consists of relatively isolated subgroups such that members of the same subgroup interact a great deal, but only a few pairs of members from different subgroups are connected. One might think, for instance, of countries as isolated communities connected by travelers across the globe, or of interactions in social media, which often happen in almost closed communities, with only a few influential members interacting across groups. Our model allows one to study the spreading dynamics of information or of a disease among the members of a population structured as separate communities.
Consider a population consisting of r isolated communities and a ‘mobile’ community. The members of each isolated community interact only among themselves and with members of the mobile community. Thus, there is no direct interaction between members of different communities. However, indirect inter-community interactions happen via the set of mobile members. This idea was used in [Reference Akhil, Altman and Sundaresan2], where the authors considered an optimal control problem to find the optimal resource allocation strategy that maximizes information spread over a multi-community population. Their objective was to obtain a good tradeoff between the information spread in the network and the use of system resources.
Now, let $\mathcal{Z}\,:\!=\,\{0,1\}$ be the state space that indicates whether the particle is susceptible $(=0)$ or infected $(=1)$ . Recalling the model description introduced in Section 2, one might think of the central nodes of each block as an isolated community that interacts with other communities only through the members of the mobile community represented by the peripheral nodes. Note that in contrast to [Reference Akhil, Altman and Sundaresan2], the central nodes of a given block interact only with peripheral (mobile) nodes from the same block, and not with all the peripheral/mobile nodes, as stipulated in [Reference Akhil, Altman and Sundaresan2]. Also, the interaction graph for the peripheral members is not complete; thus, not all the peripheral nodes interact with each other. Nevertheless, the fact that the multi-chaotic property holds under Condition 4.1 (cf. Theorem 5.1) tells us that systems with full connections among the peripheral components and among the components of each block are asymptotically close to those with fewer connections, as specified by Condition 4.1 and Remark 4.1. This is of interest, for example, in resource allocation problems where a cost is attributed to each connection; however, such considerations are beyond the scope of the present paper.
Denote by $X^c_{n,j}(t)$ , for $n\in C_j^c$ , and $X^p_{m,j}(t)$ , for $m\in C_j^p$ , the state (‘susceptible’ or ‘infected’) of the nth central particle and the mth peripheral particle, respectively, in the jth community. Two connected central members of the same community j come in contact with each other at rate $\gamma_j$ . Connected peripheral and central nodes from the same community interact with each other at rate $\nu_j$ . Two connected peripheral nodes come in contact with each other at rate $\eta$ . Finally, an infected node in the jth community spontaneously gets rid of the infection at rate $\zeta_j$ . Therefore, the transition rates $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ , which sum up the dynamics that we are interested in, are specified as follows:
The state $X^c_{n,j}(t)$ of each central member $n\in C^c_j$ at time t goes from z to z ′ at rate
\begin{align*}\lambda_{j,z,z'}^{c}\,:\!=\,\left\{\begin{array}{l@{\quad}l} \sum\limits_{\substack{m\in \mathfrak{N}_j^c(n) }}X^c_{m,j}(t)\gamma_j+\sum\limits_{\substack{m\in \mathfrak{N}_j^p(n) }}X^p_{m,j}(t)\nu_j & \text{if}\ \text{z}=0\ \text{and}\ \text{z}'=1 ,\\ \zeta_j & \text{if}\ \text{z}=1\ \text{and}\ \text{z}'=0 ,\\ -\sum\limits_{y\neq z}\lambda^c_{j,z,y} & \text{if}\ \text{z}'=\text{z},\\ 0&\mbox{otherwise.} \end{array}\right.\end{align*}The state $X^p_{n,j}(t)$ of each peripheral (mobile) member $n\in C^p_j$ at time t goes from z to z ′ at rate
\begin{align*}\lambda^p_{j,z,z'}\,:\!=\,\left\{\begin{array}{l@{\quad}l} \sum\limits_{\substack{m\in \mathfrak{N}_j^c(n) }}X^c_{m,j}(t)\nu_j+\sum\limits_{k=1}^r\sum\limits_{\substack{m\in \mathfrak{N}_k^p(n) }}X^p_{m,k}(t)\eta & \text{if}\ \text{z}=0\ \text{and}\ \text{z}'=1 ,\\ \zeta_j & \text{if}\ \text{z}=1\ \text{and}\ \text{z}'=0 ,\\ -\sum\limits_{y\neq z}\lambda^p_{j,z,y} & \text{if}\ \text{z}'=\text{z},\\ 0&\text{otherwise.} \end{array}\right.\end{align*}
Note that the large deviations properties established in Section 6 constitute a step towards the study of the large-time behavior of such systems. Indeed, the large deviations of the empirical measures established in Theorem 6.2 can be used to investigate the large deviations of the invariant measure, from which one can study the large-time behavior of the system and related phenomena such as metastability and convergence to the invariant measure. The interested reader can consult, e.g., [Reference Dawson, Sid-Ali and Zhao25, Reference Freidlin and Wentzell36, Reference Hwang and Sheu44, Reference Yasodharan and Sundaresan67] and the references therein for further insight.
4. Existence and uniqueness of the limiting system
This section aims to introduce and prove the existence of the limiting equation that describes the behavior of the interacting particle system detailed in Section 2, as the total number of particles N in the system tends to infinity. In particular, this equation is of McKean–Vlasov type as explained below. The main result of this section is Theorem 4.1, which establishes the existence and uniqueness of the solution of the limiting McKean–Vlasov equation. The convergence of the system towards this equation will then be investigated in Section 5.
4.1. Notation and conventions
Let $(\mathbb{S}, d)$ be a Polish space. For any $y\in\mathbb{S}^d$ , for some $d\in\mathbb{N}$ , one writes $\|y\|\,:\!=\,\max (y_1, \ldots, y_d)$ . For any $x\in\mathcal{D}([0,T],\mathbb{S}^d)$ , $\|x\|_T$ denotes $\sup_{0\leq t\leq T}\|x(t)\|$ . Let $\mathcal{M}(\mathbb{S})$ be the set of all measures on $\mathbb{S}$ . Given $\mu,\nu\in\mathcal{M}(\mathbb{S})$ , the bounded-Lipschitz metric $d_{BL}(\cdot,\cdot)$ is defined by
where
Recall that the bounded-Lipschitz metric metrizes the weak convergence of probability measures on $\mathbb{S}$ with respect to bounded continuous test functions $C_b(\mathbb{S})$ . For $p\geq 1$ , let $\mathcal{P}_p(\mathbb{S})$ be the collection of all probability measures on $\mathbb{S}$ with finite pth moment. Then, for any $\mu$ and $\nu$ in $\mathcal{P}_p(\mathbb{S})$ , the pth Wasserstein distance between $\mu$ and $\nu$ is defined as
where $\Gamma (\mu ,\nu )$ denotes the collection of all measures on $\mathbb{S}\times \mathbb{S}$ with marginals $\mu$ and $\nu$ . Moreover, for $M_1,M_2$ in $\mathcal{P}_p\big(\mathcal{D}([0,T],\mathbb{S})\times\cdots\mathcal{D}([0,T],\mathbb{S})\big)$ , the pth Wasserstein distance between $M_1$ and $M_2$ is given, for any $t\in [0,T]$ , by
4.2. The limiting system
We use in the sequel the convention that N goes to infinity when both $\min_{1\leq j\leq r}N_j^c$ and $\min_{1\leq j\leq r}N_j^p$ go to infinity. Given the multi-population setting, one describes the state of the system at each time t using the following empirical measure vector:
where we recall that for each $1\leq j\leq r$ , $\mu_j^{c, N}(t)$ (resp. $\mu_j^{p, N}(t)$ ) is the empirical measure describing the states of the central (resp. peripheral) nodes of the jth block at time t. Under some regularity conditions (cf. Condition 4.1), we will prove in Section 5 the convergence, as N tends to infinity, of the empirical measure vector $\mu^{N}$ towards the distribution $\mu\in\mathcal{M}_1(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big))$ , the solution of an appropriate limiting system. Namely, the empirical vector $\mu^N$ should converge weakly to $\mu$ with
where $ \mu_j^{c}\,:\!=\,\mathcal{L}\big( \bar{X}^c_{n,j}\big)$ for $n\in C_j^c$ and $\mu_j^{p}\,:\!=\,\mathcal{L}( \bar{X}^p_{m,j})$ for $m\in C_j^p$ , with $\Big(\bar{X}^c_{n,j}, \bar{X}^p_{m,j},n\in C_j^c,m\in C_j^p, 1\leq j\leq r\Big)$ being the solution of the following system of stochastic differential equations:
The vectors $\upsilon_{j}^c(t)$ and $\upsilon_{j}^p(t)$ are defined by
where $p_j^c,p_j^p,\alpha_j^c, q_{j1},\ldots,q_{jr}\in (0,1)$ are parameters satisfying
these parameters will later be chosen appropriately (cf. Condition 4.1). The link between the initial conditions of the systems (12) and (14) will be introduced in the sequel. Observe that the solution of (14) depends not only on its sample path but also on the distribution of the process itself. Thus, the system (14) is McKean–Vlasov.
4.3. Regularity assumptions
We introduce and discuss here the regularity conditions under which the existence and uniqueness of the limiting system (14), together with the propagation of chaos and laws of large numbers investigated in Section 5, hold.
Condition 4.1.
-
1. For all $1\leq j\leq r$ and $(z,z')\in\mathcal{E}$ , there exist measurable functions $\gamma^{j,c}_{z,z'}\,:\,\mathcal{Z}\rightarrow\mathbb{R}^+$ and $\gamma^{j,p}_{z,z'}\,:\,\mathcal{Z}\rightarrow\mathbb{R}^+$ such that the following hold:
For any probability measures $\nu,\mu\in\mathcal{M}_1(\mathcal{Z})$ and any real numbers $a_1,a_2$ satisfying $0<a_1,a_2<1$ and $a_1+a_2=1$ , we have
(16) \begin{equation}\lambda_{j,z,z'}^{c}(\nu,\mu,a_1,a_2)=a_1\int_{\mathcal{Z}}\gamma^{j,c}_{z,z'}(x)\nu (dx)+ a_2\int_{\mathcal{Z}}\gamma^{j,p}_{z,z'}(x)\mu (dx). \end{equation}For any $\nu,\mu_1,\ldots,\mu_r\in\mathcal{M}_1(\mathcal{Z})$ and any real numbers $a,b_1,\ldots,b_r$ satisfying $0<a,b_1,\ldots,b_r<1$ and $a+b_1+\cdots+b_r=1$ , we have
(17) \begin{align}\lambda^p_{j,z,z'}(\nu,\mu_1,\ldots,\mu_r,a,b_1,&\dots,b_r)= a\int_{\mathcal{Z}}\gamma^{j,c}_{z,z'}(x)\nu (dx) \nonumber\\&+b_1\int_{\mathcal{Z}}\gamma^{1,p}_{z,z'}(x)\mu_1 (dx)+\cdots+b_r\int_{\mathcal{Z}}\gamma^{r,p}_{z,z'}(x)\mu_r (dx).\end{align}
-
2. For each block $1\leq j\leq r$ , there exist $p_j^c,p_j^p\in(0,1)$ such that, as $N\rightarrow\infty$ ,
(18) \begin{align}\frac{N_j^p}{N_j}\rightarrow p_j^p,\quad \frac{N_j^c}{N_j}\rightarrow p_j^c,\quad and \quad p^p_j+p^c_j=1.\end{align} -
3. For each block $1\leq j\leq r$ , as $N\rightarrow\infty$ ,
(19) \begin{align}\sup_{n\in C_j^c}\left|\varrho_{n,j}^c-p^c_j\right|\rightarrow 0 \quad and \quad \sup_{n\in C_j^c}\left|\varrho_{n,j}^p-p^p_j\right|\rightarrow 0.\end{align} -
4. For each block $1\leq j\leq r$ , there exist $\alpha_j^c,q_{j1},\ldots,q_{jr}\in (0,1)$ with $\alpha_j^c+q_{j1}+\cdots+q_{jr}=1$ such that the following conditions hold for each block $1\leq i\leq r$ , as $N\rightarrow\infty$ :
(20) \begin{align}\sup_{n \in C_j^p}\bigg|\varsigma_{n,j}^c-\alpha_{j}^c\bigg|\rightarrow 0\quad\text{ and }\quad\sup_{n\in C_j^p}\bigg|\varsigma_{n,j,i}^p-q_{ji}\bigg|\rightarrow 0.\end{align} -
5. For all nodes $n\in\mathcal{V}$ , $deg(n)\rightarrow\infty$ as $N\rightarrow\infty$ .
Remark 4.1.
-
1. Since $\mathcal{Z}$ is a finite state space, the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ are bounded on $\mathcal{Z}$ . Moreover, since $\mathcal{Z}\subset \mathbb{N}$ and since every bounded function on $\mathbb{N}$ is automatically Lipschitz, $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ are also Lipschitz. Denote by $\bar{\gamma}>0$ the maximum bound and by $L_{\gamma}$ the maximum Lipschitz coefficient of the sequences of functions $\Big\{\gamma^{j,c}_{z,z'},1\leq j \leq r,(z,z')\in\mathcal{E}\Big\}$ and $\Big\{\gamma^{j,p}_{z,z'},1\leq j \leq r,(z,z')\in\mathcal{E}\Big\}$ .
-
2. The conditions in (19) and (20) are satisfied if, for instance, for all $1\leq j\leq r$ , the following hold:
For each $n\in C_j^c$ , we have $M_{j}^{c}(n)/N_j^c\rightarrow 1$ and $M_{j}^{p}(n)/N_j^p\rightarrow 1$ as $N\rightarrow\infty$ .
For each $n\in C_j^p$ , we have $M_{j}^{c}(n)/N_j^c\rightarrow 1$ and $M_i^{p}(n)/N_i^p\rightarrow 1$ as $N\rightarrow\infty$ for all $1\leq i\leq r$ .
-
Indeed, under this assumption one can define
(21) \begin{align}\alpha_j^c&=\lim_{N\rightarrow\infty}\frac{N_j^c}{N_j^c+N_1^p+\cdots+N_r^p}\qquad\forall 1\leq j\leq r,\end{align}(22) \begin{align}q_{ji}&=\lim_{N\rightarrow\infty}\frac{N_{i}^p}{N_j^c+N_1^p+\cdots+N_r^p}\qquad\forall 1\leq j,i\leq r, \end{align}and thus, one can easily verify that, as $N\rightarrow\infty$ , the following hold:
For all $n\in C_j^c$ ,
(23) \begin{align}\frac{1+M_{j}^{c}(n)}{1+deg(n)}\rightarrow p^c_j\mbox{ and } \frac{M_{j}^{p}(n)}{1+deg(n)}\rightarrow p^p_j.\end{align}For all $n\in C_j^p$ and $i\neq j$ ,
(24) \begin{align}\frac{M_i^{p}(n)}{1+deg(n)}\rightarrow q_{ji},\quad \frac{1+M_j^{p}(n)}{1+deg(n)}\rightarrow q_{jj},\quad\text{and}\quad\frac{M_{j}^{c}(n)}{1+deg(n)}\rightarrow\alpha_{j}^c.\end{align}
-
3. A special case where the conditions (19) and (20) are satisfied is when the blocks are cliques and the peripheral subgraph is complete—that is, when all peripheral nodes are connected (see Figure 1) and all the nodes in the same block are connected. In such a case, the central (resp. peripheral) nodes in the same block are exchangeable.
-
4. Even though the conditions (19) and (20) are somewhat restrictive, the construction of the model allows one to have very different degrees in each block. One might further compare these conditions with existing conditions in the literature. Consider for example the condition imposed in [Reference Budhiraja, Mukherjee and Wu15] for a supermarket model on sparse graphs to asymptotically behave as on cliques. The condition in [Reference Budhiraja, Mukherjee and Wu15] relies on the local properties of the graph by requiring direct neighbors of any node to have asymptotically similar degrees; see [Reference Budhiraja, Mukherjee and Wu15, Condition 1(ii)]. This condition is violated in our model. Indeed, the conditions (19) and (20) allow central and peripheral nodes from the same block to have very different degrees, even if they are neighbors, which goes beyond [Reference Budhiraja, Mukherjee and Wu15, Condition 1(ii)]. In addition, under our condition, $deg_{\max} (G)/deg_{\min}(G)$ should not go to 1 as $N\rightarrow\infty$ , nor does $\max_j \left|\left(deg_{\min} (C_j)/deg_{\max}(C_j)\right)-1\right|$ go to zero as proposed in [Reference Budhiraja, Mukherjee and Wu15, Remark 1] (here $deg_{\min}(C_j)$ and $deg_{\max}(C_j)$ refer to the minimum and maximum degrees of nodes within the same block j). In this sense, the family of graphs which we are considering in the present work is sparser than the ones covered by [Reference Budhiraja, Mukherjee and Wu15, Condition 1(ii)]. Another condition with which to compare ours is the one proposed in [Reference Delattre, Giacomin and Luçon28], under which an n-dimensional diffusion system converges to a limiting Fokker–Planck equation; see [Reference Delattre, Giacomin and Luçon28, Equations (1.1) and (1.3)]. Note that [Reference Delattre, Giacomin and Luçon28, Equations (1.5) and (1.7)] impose global regularity conditions in the sense that the degrees of all the nodes should converge to the same limit; such conditions are not imposed here.
-
5. While the current paper considers deterministic graphs, one can investigate the case where the underlying graph topology is random. For example, it is of interest for some applications to have a scenario where the connections between the peripheral nodes are random. One can then search for adequate conditions to impose on the edge dynamics for the propagation-of-chaos property to hold. This, however, goes beyond the scope of the current paper.
4.4. Existence and uniqueness
We now prove the existence and uniqueness of the solution of the limiting McKean–Vlasov system introduced in (14).
Theorem 4.1. Suppose that Condition 4.1 holds. Then, for a given initial condition $\Big(\Big(\bar{X}_n^{c}(0),\bar{X}_m^{p}(0)\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ , the McKean–Vlasov system (14) has a unique solution over any finite time interval [0,T]. In addition, this solution depends continuously on the initial condition in the following sense: if $(\bar{X}^1(t),t\in[0, T])$ and $(\bar{X}^2(t),t\in[0, T])$ are two solutions of (14) with two different initial conditions $(\bar{X}^1(0))$ and $(\bar{X}^2(0))$ , respectively, then there exists a constant $A_T$ , depending on the time horizon T, such that
Proof. For $1\leq j\leq r$ , with a slight abuse of notation, let
and
be the cth and the pth component, respectively, of the jth projection. Moreover, for $t\leq T$ , let $p_t\,:\, f\in \mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\rightarrow f(t)\in\mathcal{Z}^{2r}$ .
For $M\in\mathcal{M}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ , let $M (t) \,:\!=\, M \circ p_t^{-1}$ . Consider the system starting at $\bar{X}_0=(\bar{X}_0^{1,c},\bar{X}_0^{1,p},\ldots,\bar{X}_0^{r,c},\bar{X}_0^{r,p})$ and given, at each $t\in (0,T]$ , by
for $1\leq j\leq r$ , where $\mu_j^c(t)=M(t)\circ e_{j,c}^{-1}$ and $\mu_j^p(t)=M(t)\circ e_{j,p}^{-1}$ , the vectors $\upsilon_{j}^c(t)$ and $\upsilon_{j}^p(t)$ are given by (15), and $\big\{\mathcal{N}_{j}^c, 1\leq j\leq r\big\}$ and $\big\{\mathcal{N}_{j}^p, 1\leq j\leq r\big\}$ are collections of Poisson random measures on $\mathbb{R}^2$ whose intensity measures are Lebesgue on $\mathbb{R}^2_+$ . Denote by $\psi$ and $\phi$ the mappings that associate to M the solution of this system and its corresponding law. Thus, $\psi (M)=\big(\bar{X}^c_{j}, \bar{X}^p_{j},1\leq j\leq r\big)$ and $\phi (M)=\mathcal{L}\big(\bar{X}^c_{j}, \bar{X}^p_{j},1\leq j\leq r\big)$ . Observe that if $\bar{X}$ is a solution of (14), then its law is a fixed point of $\phi$ . Conversely, if M is a fixed point of $\phi$ for the system (26), then the corresponding solution $\psi (M)$ defines a solution of the limiting system (14). The idea is then to prove the existence of a fixed point of $\phi$ .
Take $M_1,M_2\in\mathcal{M}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ . Set $\bar{X}_1\,:\!=\,\big(\bar{X}^{1,c}_1,\bar{X}^{1,p}_1\ldots,\bar{X}^{1,c}_r,\bar{X}^{1,p}_r\big)=\psi (M_1)$ and $\bar{X}_2\,:\!=\,\big(\bar{X}^{2,c}_1,\bar{X}^{2,p}_1\ldots,\bar{X}^{2,c}_r,\bar{X}^{2,p}_r\big)=\psi (M_2)$ . Thus, $\mathcal{L}(\bar{X}_1)=\phi (M_1)$ and $\mathcal{L}(\bar{X}_2)=\phi (M_2)$ . Moreover, for all $t\in [0,T]$ , define $\mu_1(t)\,:\!=\,(\mu_1^{1,c}(t),\mu_1^{1,p}(t),\ldots,\mu_r^{1,c}(t),\mu_r^{1,p}(t))$ and $\mu_2(t)\,:\!=\,(\mu_1^{2,c}(t),\mu_1^{2,p}(t),\ldots,\mu_r^{2,c}(t),\mu_r^{2,p}(t))$ with $\mu_j^{1,c}(t)\,:\!=\,M_1(t)\circ e_{j,c}^{-1}$ , $\mu_j^{1,p}(t)\,:\!=\,M_1(t)\circ e_{j,p}^{-1}$ , $\mu_j^{2,c}(t)\,:\!=\,M_2(t)\circ e_{j,c}^{-1}$ and $\mu_j^{2,p}(t)\,:\!=\,M_2(t)\circ e_{j,p}^{-1}$ for $1\leq j\leq r$ . According to (15), we introduce the following notation:
We first prove that $\phi$ is a contraction mapping on $\mathcal{M}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ ; that is, for any $t\in[0,T]$ ,
To this end, for ease of reading, let us introduce the following notation:
Indeed, for any $1\leq j\leq r$ we have that
Using a martingale argument (see (63)) and taking the expectation, by adding and subtracting terms (see (65)) one gets, for any $t\in [0, T]$ ,
Recall the definition of the functions $\lambda_{j,z,z'}^{c}$ in (16). Given that $\mu_j^c(t)$ and $\mu_j^p(t)$ are probability measures and using the boundedness of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ , one easily gets that
and
Therefore one obtains
Using (17) and the same steps as previously, one finds, for any $1\leq j\leq r$ ,
On the one hand, from the Kantorovich–Rubinstein theorem, one has that for $1\leq j\leq r$ and $\alpha\in\{c,p\}$ ,
where the supremum is taken over the functions g with Lipschitz constant 1. Therefore,
On the other hand, one can easily verify that
Thus, using (37) and (38) and taking the supremum over $1\leq j\leq r$ in (34) and (35), one obtains
and
Adding the two last inequalities side by side and applying Grönwall’s lemma leads to
Hence,
with $C(t)=K|\mathcal{E}|\bar{\gamma}e^{K|\mathcal{E}|\bar{\gamma}t}$ . From the definition of the Wasserstein distance, it is easy to observe that
from which one deduces (28).
Consider now the following recursive scheme:
$M_0\in\mathcal{M}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ ;
$M_{k+1}=\phi (M_k),\quad k\geq 0$ .
By iterating the formula in (28) and using the fact that $\mathcal{W}_{1,t}(M_1,M_0)$ is increasing in t, one finds that for all $k\geq 0$ ,
Moreover, it is easy to verify that $\mathbb{E}[\|\bar{X}\|_T]<\infty$ , where $\bar{X}=\psi(M)$ for any $M\in\mathcal{M}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ , from which we deduce that $\mathcal{W}_{1,t}(M_1,M_2)<\infty$ and thus the sequence $\{M_k\}_{k\geq 0}$ is a Cauchy sequence. Note that the space $\mathcal{P}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ endowed with the Wasserstein distance $\mathcal{W}_{p,T}$ is complete (see [Reference Bolley11]). Hence the sequence $\{M_k\}_{k\geq 0}$ converges to some measure M in $\mathcal{P}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ which is a fixed point of $\phi$ on $\mathcal{P}_1\big(\mathcal{D}\big([0,T],\mathcal{Z}^{2r}\big)\big)$ . This proves the existence of the solution of the equation in (26) and thus that of the equation in (14). Uniqueness follows from again using (28) and Grönwall’s lemma.
Let $\big(\bar{X}^1(t)\big)\,:\!=\,\Big(\bar{X}^{1,c}_{n,j}(t),\bar{X}^{1,p}_{m,j}(t),n\in C_j^c,m\in C_j^p, 1\leq j\leq r\Big)$ and $\big(\bar{X}^2(t)\big)\,:\!=\,\Big(\bar{X}^{2,c}_{n,j}(t),\bar{X}^{2,p}_{m,j}(t),n\in C_j^c,m\in C_j^p, 1\leq j\leq r\Big)$ be two solutions of (14) with respective initial conditions $(\bar{X}^1(0))$ and $(\bar{X}^2(0))$ . Denote by $\mu_j^{1,c}(t)\,:\!=\,\mathcal{L}\Big(\bar{X}^{1,c}_{n,j}(t)\Big)$ and $\mu_j^{1,p}(t)\,:\!=\,\mathcal{L}\Big(\bar{X}^{1,p}_{m,j}(t)\Big)$ , for $1\leq j \leq r$ , the probability measures corresponding to the first solution. Similarly, denote by $\mu_j^{2,c}(t)\,:\!=\,\mathcal{L}\Big(\bar{X}^{2,c}_{n,j}(t)\Big)$ and $\mu_j^{2,p}(t)\,:\!=\,\mathcal{L}\Big(\bar{X}^{2,p}_{m,j}(t)\Big)$ the probability measures corresponding to the second solution. Again, for ease of reading, let us introduce the following notation:
Using this together with the notation in (27), one finds that for any $n\in C_j^c$ ,
Using a martingale argument (see (63)), taking the conditional expectation $E^0$ given $(\bar{X}^1(0),\bar{X}^2(0))$ , and finally adding and subtracting terms (see (65)), one gets that for $t\in [0,T]$ ,
Since $\mu_{j}^{1,c}(t)$ and $\mu_{j}^{1,p}(t)$ are probability measures, it is easy to see using (16) and the boundedness of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ that
Additionally, the Lipschitz property of the functions $\gamma^{j,c}_{z,z'}$ leads to
Therefore one obtains
Taking the expectation on both sides of the last inequality, and recalling that $p_j^c+p_j^p=1$ for all $1\leq j \leq r$ , one gets
Thus,
Taking the maximum over $n\in C_j^c$ and $1\leq j\leq r$ gives
Using similar arguments, one finds that for any $1\leq j\leq r$ and $n\in C_j^p$ ,
By (17) and the Lipschitz boundedness property of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ one finds that
Taking the expectation on both sides of the last inequality gives
Recall that, for all $1\leq j \leq r$ , $\alpha_j^c+q_{j1}+\cdots+q_{jr}=1$ . Therefore
Taking the maximum over $n\in C_j^p$ and $1\leq j\leq r$ gives
Now (51) and (56) together lead to
Then Grönwall’s lemma gives
Defining $A_t=8K\bar{\gamma}|\mathcal{E}|(2+r)t$ leads to (25). The theorem is proved.
5. Laws of large numbers and propagation of chaos
This section investigates the weak convergence of the finite particle system represented by the stochastic differential equation in (12) towards the limiting McKean–Vlasov system (14) as the total number of particles N tends to infinity. In particular, as the main results of this section, we establish propagation of chaos in Theorem 5.1 and laws of large numbers in Corollary 5.1.
Let us start this section by recalling the notions of multi-exchangeability and multi-chaoticity introduced in [Reference Graham40].
Definition 5.1. A sequence of random variables $(X_{n,k}, 1 \leq n \leq N_k, 1 \leq k \leq K)$ indexed by $(N_k, 1\leq k\leq K)\in \mathbb{N}^K$ is said to be multi-exchangeable if its law is invariant under permutation of the indexes within the classes; that is, for any permutations $\sigma_k$ of $\{1,\ldots, N_k\}$ for $1 \leq k \leq K$ , the following equality holds in distribution:
A sequence of random variables $(X_{n,k}, 1\leq n\leq N_k, 1\leq k\leq K)$ indexed by $(N_k,1\leq k\leq K)\in \mathbb{N}^K$ is $P_1\otimes\cdots\otimes P_K$ -multi-chaotic if, for any $m\geq 1$ , the convergence in distribution
holds for the topology of uniform convergence on compact sets, where $P_k$ , for $1\leq k \leq K$ , is a probability distribution on $\mathbb{R}_+$ , and with the convention that N goes to infinity when $\min N_k$ goes to infinity.
5.1. Propagation of chaos
The following result establishes the weak convergence of the pre-limiting system (12) towards the McKean–Vlasov system (14) as the number of particles N goes to infinity, and thus its multi-chaoticity.
Theorem 5.1. Suppose that Condition 4.1 holds true. Moreover, suppose that the initial conditions $\Big(X_{n,j}^{c}(0),X_{m,j}^{p}(0),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ are multi-exchangeable and $\nu^{1,c}\otimes\nu^{1,p}\otimes\cdots \nu^{r,c}\otimes\nu^{r,p}$ -multi-chaotic. Then, for any $t\in [0,T]$ , as $N\rightarrow\infty$ ,
and the sequence of processes $\Big(\Big(X_n^{c,N}(t),X_m^{p,N}(t),t\geq 0\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ , the solutions of the stochastic differential equation (12) with initial conditions $\Big(X_n^{c,N}(0),X_m^{p,N}(0),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ , is $P_{\bar{X}}$ -multi-chaotic, where $P_{\bar{X}}=\mu_{1}^c\otimes\mu_{1}^p\otimes\cdots\mu^c_{r}\otimes \mu_{r}^p$ is the distribution of the process $\Big(\Big(\bar{X}_n^{c}(t),\bar{X}_m^{p}(t),t\geq 0\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ , the solution of the limiting stochastic differential equation (14) with initial distribution $\nu^{1,c}\otimes\nu^{1,p}\otimes\cdots \nu^{r,c}\otimes\nu^{r,p}$ .
Before proceeding to the proof, we recall, without proof, an elementary result on (conditionally) independent and identically distributed (i.i.d.) random variables.
Lemma 5.1. Let $\{S_i\,:\,i=1,\ldots,n\}$ be a collection of $\mathbb{S}$ -valued random variables defined on some probability space $(\Omega,\mathcal{F},\mathbb{P})$ , where $\mathbb{S}$ is a Polish space. Suppose that $S_1,\ldots,S_n$ are conditionally i.i.d. given some $\sigma$ -algebra $\mathcal{G}\subset\mathcal{F}$ . Then, for any $k\in\mathbb{N}$ , there exists a positive and finite constant $0<a_k<\infty$ such that
Proof of Theorem 5.1. We use a coupling method. Let $X(t)=\Big(\Big(X_{n,j}^{c}(t),X_{m,j}^{p}(t)\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ be the solution of the stochastic differential equation (12) with initial conditions $X(0)=\Big(\Big(X_{n,j}^{c}(0),X_{m,j}^{p}(0)\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ . Moreover, let $Y(t)=\Big(\Big(Y_{n,j}^{c}(t),Y_{m,j}^{p}(t)\Big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ be the solution of the limiting stochastic differential equation (14) with the same initial conditions as X(t), i.e., $Y(0)=X(0)$ . Also, let the processes Y(t) and X(t) be defined on the same probability space by taking the same sequences of Poisson random measures $\big\{\mathcal{N}^{c}_{n,j}\big\}$ and $\big\{\mathcal{N}^{p}_{n,j}\big\}$ in both cases. We first prove that these two processes are asymptotically close, that is, for any $t\in [0, T]$ ,
To this end, we treat the central and peripheral nodes in two separate steps. For convenience, define
Step 1. Fix $1\leq j\leq r$ . For each central node $n\in C_j^c$ and any $t\in [0,T]$ ,
Denote by $\mathcal{F}_t$ the filtration generated by the Poisson random measures and defined by
Then $X^{c}_{n,j}(t)$ and $Y^{c}_{n,j}(t)$ are adapted to the filtration $\mathcal{F}_t$ . Therefore, the two processes
are $\mathcal{F}_t$ -martingales. Furthermore, (62) reduces to
Recall that $K=|\mathcal{Z}|$ is the cardinality of the set $\mathcal{Z}$ . By adding and subtracting terms we obtain
The goal now is to bound the right-hand side of (65). Let us start with the second term. Again, by adding and subtracting terms we get
Denote by $\mathcal{J}_1$ and $\mathcal{J}_2$ respectively the first and the second expectation in the right-hand side of (66). Then, from the Lipschitz property of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ , $\mathcal{J}_1$ is bounded as follows:
where $L_{\gamma}$ is the maximum Lipschitz constant of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ for all $(z,z')\in\mathcal{Z}^2$ . Moreover, by adding and subtracting terms and using the fact that both $\big\{Y_{n,j}^{c}(s)\big\}$ and $\big\{Y_{n,j}^p(s)\big\}$ are sequences of i.i.d. random variables, $\mathcal{J}_2$ can be bounded as follows:
Note that, by the exchangeability of $\big\{Y_{n,j}^{c}(s), n\in C_j^c\big\}$ and the boundedness of the functions $\gamma^{j,c}_{z,z'}$ , one obtains
In the same manner, the fourth term in the right member of (68) is also bounded as follows:
Furthermore, using (60), the first and third expectations in (68) are bounded by $\frac{\kappa_1 p_j^c}{\sqrt{M_{j}^{c}(n)+1}}$ and $\frac{\kappa_2 p_j^p}{\sqrt{M_{j}^{p}(n)}}$ , respectively, where $\kappa_1$ and $\kappa_2$ are positive constants.
Now let us take a look at the first term of the right-hand side of (65). Since $X^c_{n,j}$ and $Y^c_{n,j}$ are $\mathcal{Z}$ -valued, and $\mathcal{Z}$ is a subset of $\mathbb{N}$ , one easily sees that
where $\bar{\gamma}$ is the upper bound of the functions $\gamma^{j,c}_{z,z'}$ and $\gamma^{j,p}_{z,z'}$ for all $(z,z')\in\mathcal{Z}^2$ . Finally, by combining (65), (66), (67), (68), (69), (70), and (71), one obtains
where $|\mathcal{E}|$ stands for the cardinality of the set of edges $\mathcal{E}$ of the graph $(\mathcal{Z},\mathcal{E})$ . Recall that $deg(n)=M_{j}^{c}(n)+M_{j}^{p}(n)$ for any $n\in C_j^c$ . Using this and then taking the maximum over $n\in C_j^c$ in (72) and over $1\leq j \leq r$ , one finally obtains
Step 2. Fix a block $1\leq j\leq r$ and a peripheral node $n\in C_j^p$ . For any $t\in [0,T]$ , one has
where the last inequality is obtained by following the same steps as in (63) and (65). Again, given that $X_{n,j}^p$ and $Y_{n,j}^p$ are $\mathcal{Z}$ -valued and that $\mathcal{Z}\subset\mathbb{N}$ , the first expectation in the right-hand side of (74) can be bounded as follows:
It remains to bound the second term in the right-hand side of (74). Using Condition 4.1 one gets
By rearranging the terms and using the triangle inequality one obtains
Note that $\int_{\mathcal{Z}}\gamma^{j,p}_{z,z'}(x)\mu_i^p(s)ds= \mathbb{E}\Big[\gamma^{j,p}_{z,z'}(Y_{m,i}^{p}(s))\Big]$ for $m\in C_i^p$ and $\int_{\mathcal{Z}}\gamma^{j,c}_{z,z'}(x)\mu_j^c(s)ds=\mathbb{E}\Big[\gamma^{j,c}_{z,z'}\big(Y_{n,j}^{c}(s)\big)\Big]$ for $n\in C_j^c$ . Then, by using the exchangeability of $Y_{n,j}^{c}(t)$ for $n\in C_j^c$ , one finds
Observe that there are $r+1$ terms on the right-hand side of the last inequality. Let us start with the first expectation. By adding and subtracting terms one gets
Note that, by the Lipschitz property of the functions $\gamma^{j,c}_{z,z'}$ , one finds that
Moreover, using (60) together with the exchangeability of $\big\{Y_{n,j}^{c}(s),n\in C_j^c\big\}$ leads to
Now let us examine the remaining r terms on the right-hand side of (78). For simplicity of notation, denote by $\mathcal{I}$ the left-hand side of (78). Then, by adding and subtracting terms, one gets
Let $\mathcal{I}_1$ and $\mathcal{I}_2$ denote respectively the first and the second expectation in the right-hand side of the inequality (82). Then, by using the Lipschitz property of the functions $\gamma^{j,p}_{z,z'}$ , and recalling that, for $1\leq i\leq r$ , $M_i^{p}(n)$ represents the number of peripheral nodes of the ith block connecting to node n, the first expectation $\mathcal{I}_1$ is straightforwardly bounded as follows:
Moreover, by adding and subtracting terms in $\mathcal{I}_2$ one obtains
where
and
First, from the triangle inequality one gets
Using (20), the exchangeability of the variables $\{Y^p_{n,j}(s), n\in C_j^p\}$ , and the boundedness of the functions $\gamma^{j,p}_{z,z'}$ , we easily show that the right-hand side of (87) vanishes as $N\rightarrow\infty$ . Indeed, the jth term satisfies
and thus goes to zero by (20). Using the same steps, one obtains for all $1\leq i\leq r$ with $i\neq j$ that
which also vanishes as $N\rightarrow\infty$ by (20); thus, so does $\mathcal{I}_4$ . In order to bound $\mathcal{I}_3$ , we use again the moment inequality (60), which straightforwardly gives us
where $\theta_1,\cdots,\theta_r$ are positive constants. Now, by (75), (80), (81), (83), (88), (89), and (90), one obtains
Recall that $deg(n)=M_{j}^{c}(n)+\sum_{k=1}^r M_k^{p}(n)$ for any $n\in C_j^p$ . Using this and taking the maximum over $n\in C_j^p$ and over $1\leq j\leq r$ in (91), one gets
Adding side by side the two inequalities in (73) and (92) leads to
where, with a slight abuse of notation, the constants $C_1,C_2$ and the function $C_3(N)$ are defined by
and
Therefore, applying Grönwall’s lemma to (93) gives
with $C_4=C_1+C_2$ . Finally, Condition 4.1 ensures that $C_3(N)\rightarrow 0$ as $N\rightarrow\infty$ , which proves (61).
We are now ready to conclude the proof. First, Theorem 4.1 ensures the uniqueness of the solution of the limiting stochastic differential equation (14). In addition, the relation in (25) shows that the solution is continuous with respect to the initial condition. Therefore, the process $Y(t)=\big(\big(Y_n^{c}(t),Y_m^{p}(t)\big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\big)$ is $\mu^c_{1}\otimes\mu_{1}^p\otimes\cdots \mu_{r}^c\otimes\mu_{r}^p$ -multi-chaotic since the initial condition $Y(0)=X(0)$ is multi-exchangeable and $\nu^{1,c}\otimes\nu^{1,p}\otimes\cdots \nu^{r,c}\otimes\nu^{r,p}$ -multi-chaotic. Then, by the relation in (61), we conclude that the convergence in (59) holds and the sequence of processes $\big(\big(X_n^{c,N}(t),X_m^{p,N}(t),t\geq 0\big),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\big)$ is also $P_{\bar{X}}=\mu^c_{1}\otimes\mu_{1}^p\otimes\cdots \mu_{r}^c\otimes\mu_{r}^p$ -multi-chaotic, which concludes the proof.
5.2. Laws of large numbers
The following laws of large numbers are immediate consequences of Theorem 5.1.
Corollary 5.1. Suppose that the conditions of Theorem 5.1 hold. Define $\mu_j^{c}\,:\!=\,\mathcal{L}\big( \bar{X}^c_{n,j}\big)$ , $\mu_j^{p}\,:\!=\,\mathcal{L}\big( \bar{X}^p_{m,j}\big)$ for $1\leq j\leq r$ , where $\big(\big(\bar{X}^c_{n,j}(t), \bar{X}^p_{m,j}(t),t\geq 0\big),1\leq j\leq r\big)$ is the solution of the McKean–Vlasov limiting system in (14) with initial distribution $\nu^{1,c}\otimes\nu^{1,p}\cdots \nu^{r,c}\otimes\nu^{r,p}$ . Then, for each $1\leq j\leq r$ , as $N\rightarrow\infty$ ,
and
for the weak topology on $\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z}))$ with $\mathcal{D}([0,T],\mathcal{Z})$ endowed with the Skorokhod topology.
Proof. We prove (96); the proof of (95) is similar. Let
and recall that the bounded-Lipschitz metric $d_{BL}$ metrizes the weak convergence on $\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z}))$ . Therefore, to prove the convergence in (96), it suffices to prove that $d_{BL}\Big(\mu_j^{p,N},\bar{\mu}_j^{p,N}\Big)\Rightarrow 0$ and that $\bar{\mu}_j^{N,p}\Rightarrow \mu_j^p$ . First note that for all $1\leq j\leq r$ ,
which goes to zero according to (59). Thus, $d_{BL}\Big(\mu_j^{p,N},\bar{\mu}_j^{p,N}\Big)\Rightarrow 0$ as $N\rightarrow\infty$ . It remains to show that $\bar{\mu}_j^{N,p}\Rightarrow \mu_j^p$ as $N\rightarrow\infty$ . Since the stochastic processes $\Big\{\bar{X}_{n,j}^p, n\in C_j^p\Big\}$ are i.i.d., for any continuous and bounded function $g\in C_b(\mathcal{Z})$ one finds that
which goes to zero given the boundedness of g. Therefore, $\bar{\mu}_j^{p,N}$ converges weakly to $\mu_j^p$ as $N\rightarrow\infty$ . Thus, combining the two convergence results, we conclude that $\mu_j^{p,N}$ converges weakly to $\mu_j^{p}$ as $N\rightarrow\infty$ . The corollary is proved.
6. Large deviations
We investigate here the large deviations principles of the interacting particle system introduced in Section 2 over finite time durations. For the sake of simplicity, we restrict ourselves to the case of block-structured graphs where the blocks are cliques, i.e., complete subgraphs, and the peripheral subgraph is complete, that is, all peripheral nodes in the system are connected; see, e.g., Figure 1. The first main result of this section is Theorem 6.1, which states the large deviations principle of the vector of empirical measures. The second main result is Theorem 6.2, which states the large deviations principle of the corresponding vector of empirical processes. The approach we take to establish these results is based on a generalization to the multi-class setting of the classical approach developed in [Reference Dawson and Gärtner24] and adapted in [Reference Léonard49] to the context of jump processes. One might also consult [Reference Feng32, Reference Feng33] for an alternative approach.
Let us first introduce the assumptions under which the results of this section hold.
Assumption 6.1.
-
1. The peripheral subgraph is complete; that is, for any two peripheral nodes $n,m\in\bigcup\limits_{1\leq j\leq r} C_{j}^p$ , there exists an edge $(n,m)\in\Xi$ connecting n and m.
-
2. The r blocks of the graphs are cliques; that is, for any two nodes $n,m\in C_{j}^p$ of the same block $1\leq j\leq r$ , there exists an edge $(n,m)\in\Xi$ connecting n and m.
-
3. The mappings $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ introduced in (16) and (17) are uniformly bounded away from zero; that is, there exists $c > 0$ such that, for all $\nu,\mu_1,\ldots,\mu_r\in\mathcal{M}_1(\mathcal{Z})$ and all $(z, z')\in\mathcal{E}$ , we have $\lambda_{j,z,z'}^{c}(\nu,\mu_j)\geq c$ and $\lambda^p_{j,z,z'}(\nu,\mu_1,\ldots,\mu_r)\geq c$ .
-
4. As $N\rightarrow\infty$ ,
(99) \begin{align}\frac{N_j}{N}\rightarrow \alpha_j\end{align}for some $\alpha_j\in (0,1)$ , where we recall that $N_j$ is the number of nodes in the jth block and $N_j^c$ $\big($ resp. $N_j^p\big)$ is the number of central (resp. peripheral) nodes in the jth block.
Remark 6.1.
-
1. From (16) and (17), together with Remark 4.1, the functions $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ are Lipschitz.
-
2. Since $\mathcal{M}_1(\mathcal{Z})$ is compact and the rate functions $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ are continuous and Lipschitz, the rates are uniformly bounded from above; that is, there exists a constant $C <\infty$ such that for all $\nu,\mu_1,\ldots,\mu_r\in\mathcal{M}_1 (\mathcal{Z})$ , and all $(z,z')\in\mathcal{E}$ , we have $\lambda_{j,z,z'}^{c}(\nu,\mu_j)\leq C$ and $\lambda^p_{j,z,z'}(\nu,\mu_1,\ldots,\mu_r)\leq C$ .
-
3. For ease of reading, we have omitted subscripts indicating the dependence of the rate functions $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ on the proportions $a_1,b_1,a,b_1,\ldots,b_r$ .
-
4. We use again throughout this section the convention that N goes to infinity when both $\min_{1\leq j\leq r}N_j^c$ and $\min_{1\leq j\leq r}N_j^p$ go to infinity.
-
5. We emphasize that for simplicity, the results obtained in this section hold under Assumption 6.1, which describes a special case of the class of models given by Condition 4.1; that is, we suppose here that each block is a clique and the peripheral subgraph is complete.
Let $\mathbb{M}^N\in(\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z})))^{2r}$ denote the vector of empirical measures defined by
where $X^N=\Big(X^c_{n,j},X^p_{m,j};\,n\in C_j^c, m\in C_j^p;\, 1\leq j\leq r\Big)\in\mathcal{D}([0,T],\mathcal{Z}^N)$ denotes the full description of the N particles and $\mathbb{M}_j^{c,N}$ $\big($ resp. $\mathbb{M}_j^{p,N}\big)$ is the empirical measure of the central (resp. peripheral) nodes of the jth block, for $1\leq j\leq r$ . With a slight abuse of notation, denote by $G_N\,:\,\mathcal{D}\big([0,T],\mathcal{Z}^N\big)\rightarrow (\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z})))^{2r}$ the mapping that takes the full description $X^N$ to the empirical measures vector $M^N$ , that is,
Thus, $\mathbb{M}^N=G_N\big(X^N\big)$ . Denote by $\mathbb{P}_{z^N}^N$ the law of $X^N$ with initial condition $z^N=\Big(z^c_{n,j},z^p_{m,j};\,n\in C_j^c, m\in C_j^p;\, 1\leq j\leq r\Big)$ . Note that the distribution of the empirical vector $\mathbb{M}^N$ depends on the initial condition only through its empirical vector, defined by
Moreover, denote by $P_{\nu^N}^N$ the distribution of $\mathbb{M}^N$ , which is the pushforward of $\mathbb{P}_{z^N}^N$ under the mapping $G_N$ ; that is, $P_{\nu^N}^N=\mathbb{P}_{z^N}^N\circ G_N^{-1}$ .
Let us now introduce the $(\mathcal{M}_1(\mathcal{Z}))^{2r}$ -valued vector of empirical processes
and denote by $\gamma_N$ the corresponding mapping that takes a full description $X^N \in\mathcal{D}([0,T], \mathcal{Z}^N)$ of the N particles of the system to the empirical process vector $\mu^N$ , that is,
Observe that $\mu^N(0)=\nu_N$ and that $\mu^N(t)$ is the projection $\pi_t\big(\mathbb{M}^N\big)$ at time t, that is,
where the notation $\pi$ denotes, again with a slight abuse of notation, both the vector projection
and the component projection
Finally, denote by $p_{\nu_N}^N$ the distribution of $\mu^N$ , which is the pushforward $p_{\nu_N}^N=\mathbb{P}_{z^N}^N\circ\gamma_N^{-1}$ . Note that, since $\mu^N=\pi\big(M^N\big)$ , we can also write $p^N_{\nu_N}$ as the pushforward $p_{\nu_N}^N=P_{\nu_N}^N\circ\pi^{-1}$ .
The goal of this section is to study the large deviations principles for the sequences of probability measures $\big(P_{\nu^N}^N,N\geq 1\big)$ and $\big(p_{\nu_N}^N,N\geq 1\big)$ . The two main results are Theorem 6.1 and Theorem 6.2.
6.1. Large deviations principle for the empirical measure vector
We start by investigating the large deviations principle of the sequence $\big(P_{\nu^N}^N, N\geq 1\big)$ . To this end, we first establish the result in the non-interacting case. Then, through the Radon–Nikodym derivative, one uses the Laplace–Varadhan principle to deduce the case with interactions.
Let us first describe the hypothetical non-interacting system. Suppose that all the nodes are independent of each other and that the color of each node changes with a constant rate equal to 1 for all allowed transitions $(z, z')\in\mathcal{E}$ , while all other transition rates are zero. Denote by $P_{z_0}$ the marginal law on $\mathcal{D} ([0,T],\mathcal{Z})$ of this process with initial condition $z_0$ . Thus, $P_{z_0}$ is the unique solution to the martingale problem in $\mathcal{D}([0,T],\mathcal{Z})$ associated with the generator $\mathcal{L}^0$ operating on bounded measurable functions $\phi$ on $\mathcal{Z}$ according to
and the initial condition $z_0$ . Given that the transition rates are upper-bounded and that
for some constant $\digamma$ , there exists a unique solution to the martingale problem for $(\mathcal{L}^0,z_0)$ (cf. [Reference Ethier and Kurtz31, Problem 4.11.15]).
For any $\eta,\rho_1,\ldots, \rho_r$ in $\mathcal{D}([0,T], \mathcal{M}_1 (\mathcal{Z}))$ , let $R_{z_0}^c(\eta,\rho_j)$ be the unique solution to the martingale problem in $\mathcal{D}([0,T],\mathcal{Z})$ associated with the time-varying generator
and the initial condition $z_0$ . Similarly, let $R_{z_0}^p(\eta,\rho_1,\ldots,\rho_r)$ be the unique solution to the martingale problem in $\mathcal{D}([0,T],\mathcal{Z})$ associated with the time-varying generator
and the initial condition $z_0$ . Again by the upper-boundedness of $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ , the uniqueness of $R_{z_0}^c(\eta,\rho_j)$ and $R_{z_0}^p(\eta,\rho_1,\ldots,\rho_r)$ follows (see again [Reference Ethier and Kurtz31, Problem 4.11.15]). Therefore, the density of $R_{z_0}^c(\eta,\rho_j)$ and $R_{z_0}^p(\eta,\rho_1,\ldots,\rho_r)$ with respect to $P_{z_0}$ can be written as follows (see [Reference Léonard49, Equation (2.4)]):
where
and
Consider now a system of N non-interacting particles where the law of the nth particle is $P_{z_n}$ with initial condition $z_n$ . The law of such a system is the product distribution $\mathbb{P}_{z^N}^{0,N}=\otimes_{n=1}^NP_{z_n}$ . Moreover, the distribution of the corresponding empirical vector is given by $P_{\nu^N}^{0, N}=\mathbb{P}_{z^N}^{0, N}\circ G_N^{-1}$ where $\nu_N$ is the initial empirical vector (101). Therefore, by applying an analogue of the Cameron–Martin–Girsanov formula for stochastic integrals with respect to point processes (see e.g. [Reference Dawson and Zheng27, Lemma 3.7] or [Reference Léonard49, Equation (2.8)]), one can compute the Radon–Nikodym derivative $dP_{\nu^N}^{N}/dP_{\nu^N}^{0, N}$ at any $\textbf{Q}=(Q_{1}^c, Q_{1}^p,\cdots, Q_{r}^c, Q_{r}^p)\in (\mathcal{M}_1(\mathcal{D}([0, T],\mathcal{Z})))^{2r}$ as follows:
with
being the vector containing the component projections $\pi\big(Q_j^{\iota}\big)\in \mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))$ for $1\leq j\leq r$ and $\iota\in\{c,p\}$ , and where
Note that under Assumption 6.1, the sequence of functions $\{h^N\}_{N\geq 1}$ converges, as $N\rightarrow\infty$ , towards the function h given by
We now introduce the necessary spaces and topologies following the notation of [Reference Borkar and Sundaresan12, Reference Léonard49]. Consider the Polish space $(\mathcal{X},d)$ where
and the metric d is defined by
with $\varphi(x)\,:\!=\,\sum_{0\leq t\leq T}{\unicode{x1D7D9}}_{x_t\neq x_{t-}}$ denoting the number of jumps and $d_{Sko}$ standing for the Skorokhod complete metric (see [Reference Billingsley10, Section 12]). For this topology, the function $\varphi$ is continuous, and two paths are close to each other if they have the same number of jumps and if they are Skorokhod-close [Reference Léonard49, p. 299]. For any function $f\,:\,\mathcal{X}\rightarrow\mathbb{R}$ , define
and write
We endow the set $\mathcal{M}_{1,\varphi}(\mathcal{X})$ with the weak- $^*$ topology $\sigma(\mathcal{M}_{1,\varphi}(\mathcal{X}), C_{\varphi}(\mathcal{X}))$ , the weakest topology under which $Q_N\rightarrow Q$ as $N\rightarrow +\infty$ if and only if
For a measure $\nu=\big(\nu^{1,c},\nu^{1,p},\ldots,\nu^{r,c},\nu^{r,p}\big)\in\big(\mathcal{M}_1(\mathcal{Z})\big)^{2r}$ we define, for all $1\leq j\leq r$ and $\iota\in\{c,p\}$ , the mixture
Moreover, let $R^c(\eta,\rho_j)$ and $R^p(\eta,\rho_1,\ldots,\rho_r)$ be the mixtures given by
Finally, let us introduce the relative entropy $H\,:\, \mathcal{M}_{1,\varphi} (\mathcal{X})\rightarrow [0,+\infty]$ of Q with respect to P as follows:
We are now ready to state the large deviations principle for the sequence $(P_{\nu_N}^N,N\geq 1)$ .
Theorem 6.1. Let the space $\mathcal{M}_{1,\varphi}(\mathcal{X})$ be equipped with the weak- $^*$ topology $\sigma(\mathcal{M}_{1,\varphi}(\mathcal{X}), C_{\varphi}(\mathcal{X}))$ . Moreover, suppose that the initial condition $\nu_N\rightarrow\nu$ weakly as $N\rightarrow\infty$ . Then the sequence $(P_{\nu_N}^N,N\geq 1)$ satisfies the large deviations principle in the space $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , endowed with the product topology, with speed N and the good rate function $I(\textbf{Q}) = L(\textbf{Q})-h(\textbf{Q})$ , where the function $h(\textbf{Q})$ is given by (110) and $L\,:\,(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}\rightarrow [0,\infty]$ is defined as
with, for each $1\leq j\leq r$ , $\iota\in\{c,p\}$ , and $Q\in\mathcal{M}_{1,\varphi}(\mathcal{X})$ ,
and $\alpha_j,p_j^c,p_j^p$ being given in (99) and (18). Furthermore, for each $\textbf{Q}\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , the rate function $I(\textbf{Q})$ admits the representation
Remark 6.2. This is a generalization of [Reference Léonard49, Theorem 2.1] to our multi-population setting. Also, while [Reference Léonard49] studied the case where $z_n=z_0$ for some fixed $z_0$ , so that $\nu_N=\delta_{z_0}$ , we consider, as in [Reference Borkar and Sundaresan12, Theorem 3.1], more general initial conditions, provided that the initial empirical vector $\nu_N$ converges weakly towards $\nu=\left(\nu^{1,c},\nu^{1,p},\cdots,\nu^{r,c},\nu^{r,p}\right)$ . Moreover, similarly to [Reference Borkar and Sundaresan12], we consider here the case where not all transitions are allowed, but only those in $\mathcal{E}$ , the set of directed edges in the graph $(\mathcal{Z},\mathcal{E})$ .
Note that, from Definition 5.1, the weak convergence of the initial empirical vector $\nu_N$ towards $\nu$ amounts to the assertion that the initial conditions $\Big(X_{n,j}^{c}(0),X_{m,j}^{p}(0),n\in C_j^c,m\in C_j^p;\,1\leq j\leq r\Big)$ are $\nu^{1,c}\otimes\nu^{1,p}\cdots \nu^{r,c}\otimes\nu^{r,p}$ -multi-chaotic (cf. [Reference Sznitman61]).
Proof of Theorem 6.1. The proof of Theorem 6.1 is based on the generalization of Sanov’s theorem for empirical measures on Polish spaces due to Dawson and Gärtner [Reference Dawson and Gärtner24], the Girsanov transformation, and the Laplace–Varadhan principle [Reference Varadhan64]. We proceed through several lemmas. We follow [Reference Léonard49, Theorem 2.1] and [Reference Borkar and Sundaresan12, Theorem 3.1].
Large deviations principle for the non-interacting case. We first establish a large deviations principle in the non-interacting case.
Lemma 6.1. Suppose that the initial condition $\nu_N$ converges towards $\nu$ weakly as $N\rightarrow\infty$ . Let $\mathcal{M}_{1,\varphi}(\mathcal{X})$ be endowed with the weak- $^*$ topology $\sigma(\mathcal{M}_{1,\varphi} (\mathcal{X}), C_{\varphi} (\mathcal{X}))$ . Then the sequence $\big(P_{\nu_N}^{0,N}, N\geq 1\big)$ satisfies a large deviations principle in $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , endowed with the product topology, with speed N and the action functional $L\,:\,(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}\rightarrow [0,\infty]$ given by (117).
Proof. Fix a given block $1\leq j\leq r$ . Denote by
the sequences of probability distributions of the local empirical measures $\mathbb{M}_j^{N,c}$ and $\mathbb{M}_j^{N,p}$ of the central and peripheral nodes, respectively, of the jth block. Note that in the non-interacting case, the transition rate from any state to any other state is bounded by 1. Therefore, the family of probability measures $\{P_z\,:\,z\in\mathcal{Z}\}$ is a subset of $\mathcal{M}_{1,\varphi}(\mathcal{X})$ . Moreover, for any continuous function $F\in C_{\varphi}(\mathcal{X})$ , the integral $\int F(y)P_{z_0}(dy)$ depends continuously upon $z_0$ , and then $\{P_{z_0}\,:\,z_0\in\mathcal{Z}\}$ is a Feller continuous family of probability measures on $\mathcal{X}$ . Now, since $\nu^{\,j,c}_N\rightarrow\nu^{\,j,c}$ and $\nu^{\,j,p}_N\rightarrow\nu^{\,j,p}$ as $N\rightarrow\infty$ , by applying the generalization of Sanov’s theorem [Reference Dawson and Gärtner24, Theorem 3.5], we find that both the sequences $\left(P_{\nu_{N}^{j,c}}^{0,N_j^c}, N_j^c\geq 1\right)$ and $\left(P_{\nu_{N}^{j,p}}^{0,{N_j^p}}, N_j^p\geq 1\right)$ satisfy the large deviations principle in $\mathcal{M}_{1,\varphi} (\mathcal{X})$ endowed with the weak- $^*$ topology $\sigma(\mathcal{M}_{1,\varphi} (\mathcal{X}), C_{\varphi} (\mathcal{X}))$ , with speeds $N_j^c$ and $N_j^p$ , respectively, and good rate functions $J^{j,c}(Q)$ and $J^{j,p}(Q)$ defined by (118). Let $\mathcal{K}^c_1,\mathcal{K}^p_1,\ldots,\mathcal{K}_r^c,\mathcal{K}_r^p\in\mathcal{B}(\mathcal{M}_{1,\varphi}(\mathcal{X}))$ be closed Borelian sets. By independence, one has
Therefore, by Assumption 6.1 we get
Similar arguments allow us to prove the lower bound for the large deviations principle, which concludes the proof.
The next result gives a characterization of the space containing the probability measures satisfying $L(\textbf{Q})<\infty$ .
Lemma 6.2. If, for a given $\textbf{Q}=\big(Q_{1}^c,Q_{1}^p,\ldots,Q_{r}^c,Q_{r}^p\big)\in(\mathcal{M}_{1}(\mathcal{D}([0,T],\mathcal{Z}))^{2r}$ , the action functional $L(\textbf{Q})<\infty$ , then the following hold:
-
1. $\textbf{Q}\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ .
-
2. $ \textbf{Q}\circ\pi_0^{-1}=\nu$ . Thus,
$$\Big(\pi_0^{-1}\big(Q_{1}^c\big),\pi_0^{-1}\big(Q_{1}^p\big),\ldots,\pi_0^{-1}\big(Q_{r}^c\big),\pi_0^{-1}\big(Q_{r}^p\big)\Big)=\Big(\nu^{1,c},\nu^{1,p},\cdots,\nu^{r,c},\nu^{r,p}\Big).$$
Proof. This is a generalization of [Reference Borkar and Sundaresan12, Lemma 5.2]. Recall that the function $\varphi (x)=\sum_{0\leq t\leq T}{\unicode{x1D7D9}}_{x(t{-})\neq x(t)}$ denotes the number of jumps of x in the interval [0,T]. From (112) we have that $\|\varphi\|_{\varphi}\leq 1$ . Moreover, $\varphi$ is continuous in the topology induced by the metric d defined in (111). Hence $\varphi\in C_{\varphi}(\mathcal{X})$ . Furthermore, $L(\textbf{Q})<\infty$ implies that, for all $1\leq j\leq r$ ,
and
Now, note that under the non-interacting distribution $P_{z_0}$ , the transition rates are bounded by 1. Since the number of allowed transitions from any state is at most equal to $K-1$ , $\varphi$ is thus stochastically dominated by a Poisson random variable of rate $(K-1)T$ . Therefore, for any initial condition $z_0\in\mathcal{Z}$ , we have $1\leq\int_{\mathcal{X}}e^{\varphi} dP_{z_0}<\infty$ . It follows from (122) and (123) that $\int_{\mathcal{X}}\varphi dQ_{j}^c<\infty$ and $\int_{\mathcal{X}}\varphi dQ_{j}^p<\infty$ for each $1\leq j\leq r$ , and so $\textbf{Q}=(Q_{1}^c,Q_{1}^p,\cdots,Q_{r}^c,Q_{r}^p)\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , which proves the first claim. In order to prove the second point, we proceed by contraposition. Suppose that for a given measure $\textbf{Q}$ , $L(\textbf{Q})<\infty$ and $\textbf{Q}\circ\pi_0^{-1}=\nu_{\textbf{Q}}\neq\nu$ . Consider the bounded continuous functions $f_1^c(x),f_1^p(x),\ldots,f_r^c(x),f_r^p(x)$ defined on $\mathcal{X}$ and depending on x only through the initial condition; that is, there exist functions $g_1^c,g_1^p,\ldots,g_r^c,g_r^p$ such that, for all $1\leq j\leq r$ ,
Since $\nu_{\textbf{Q}}\neq \nu$ , the above functions satisfy the following claim: either
or
for at least one $1\leq j\leq r$ . Therefore, one can always find, for at least one j, an arbitrarily large $a_j^c>0$ (or $a_j^p>0$ ) such that $\sum_zg_j^c(z)\nu_{\textbf{Q}}^{j,c}(z)-\sum_zg_j^c(z)\nu^{\,j,c}(z)=a_j^c$ $\big($ or $\sum_zg_j^p(z)\nu_{\textbf{Q}}^{j,p}(z)-\sum_zg_j^p(z)\nu^{\,j,p}(z)=a_j^p\big)$ . Indeed, this can be done by flipping the sign of $f_j^c$ (or $f_j^p$ ) if necessary and scaling the functions. Note that, by the assumption, $f_j^c,f_j^p\in C_{\varphi}(\mathcal{X})$ since they are bounded and continuous. Suppose without loss of generality that, for a given j, (124) is satisfied; then by direct calculation we obtain
Hence, since $a>0$ may be arbitrarily large, one gets that $J\big(Q_{j}^c\big)=\infty$ and then $L(\textbf{Q})=\infty$ , which contradicts the assumption of the lemma and thus proves the second claim.
Conditions for application of the Laplace–Varadhan lemma. Lemma 6.1 establishes the large deviations principle for the sequence $\big(P_{\nu_N}^{0,N},N\geq 1\big)$ in the topological space $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ . Moreover, recall that the Radon–Nikodym derivative is given by
where the function $h^N(\textbf{Q})$ is given by (109). Therefore, to find the large deviations principle for $\big(P_{\nu_N}^{N},N\geq 1\big)$ , one can apply the Laplace–Varadhan principle (cf. [Reference Varadhan64, Theorem 3.4]) to the sequence $\big(P_{\nu_N}^{0,N},N\geq 1\big)$ . The Laplace–Varadhan principle holds under the following conditions:
-
1. The sequence of functions $\big\{h^N(\textbf{Q})\big\}$ satisfies
(127) \begin{align}\lim_{A\rightarrow\infty} \limsup_{N\rightarrow\infty} \frac{1}{N} \log\int_{h^N(\textbf{Q})\geq A}\exp\big\{Nh^N(\textbf{Q})\big\}dP_{\nu_N}^{0,N}=-\infty.\end{align} -
2. For every $\textbf{Q}$ in $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ such that $L(\textbf{Q})<\infty$ and $ \textbf{Q}^N\rightarrow \textbf{Q}$ ,
(128) \begin{align}\limsup_N h^N\big(\textbf{Q}^N\big)\leq h(\textbf{Q}).\end{align} -
3. For every $\textbf{Q}$ in $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}_0$ and $ \textbf{Q}^N\rightarrow \textbf{Q}$ ,
(129) \begin{align}\liminf_N h^N\big(\textbf{Q}^N\big)\geq h(\textbf{Q}),\end{align}where $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}_0$ is the set of points $\textbf{Q}^*$ in $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ for which, given any $\varepsilon > 0$ , there exists a neighborhood V of $\textbf{Q}^*$ such that, for any $\textbf{Q}\in V$ and N large enough,(130) \begin{align}h^N(\textbf{Q})\geq h(\textbf{Q}^*)-\varepsilon.\end{align} -
4. We have
(131) \begin{align}\sup_{\textbf{Q}\in \big(\mathcal{M}_{1,\varphi}(\mathcal{X})\big)^{2r}}[h(\textbf{Q})-L(\textbf{Q})]=\sup_{\textbf{Q}\in \big(\mathcal{M}_{1,\varphi}(\mathcal{X})\big)^{2r}_0}[h(\textbf{Q})-L(\textbf{Q})].\end{align}
Note that, on the one hand, the condition in (127) is satisfied if, for any $\alpha>0$ ,
Indeed, take $\alpha>1$ ; then
Therefore,
where the right-hand side of the last inequality goes to $-\infty$ as $A\rightarrow\infty$ if (132) is true. On the other hand, it is easy to see that the conditions (128), (129), and (131) hold if the functions $h^N$ defined in (109) are continuous and the sequence $\big\{h^N\big\}$ converges uniformly on $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ towards the function h given in (110). The next lemmas are thus dedicated to the verification of these conditions. First, we establish a regularity property for all the probability measures $\textbf{Q}$ satisfying $L(\textbf{Q})<\infty$ .
Lemma 6.3. Let $\textbf{Q}=\big(Q_{1}^c,Q_{1}^p,\cdots,Q_{r}^c,Q_{r}^p \big)\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ be such that $L(\textbf{Q})<\infty$ . Moreover, suppose that the random vector $\textbf{X}=\big(X_1^c,X_1^p,\cdots,X_r^c,X_r^p \big)$ is distributed according to $\textbf{Q}$ . Then
Proof. This is a generalization of [Reference Borkar and Sundaresan12, Lemma 5.7]. Note that $\textbf{X}(u)\neq\textbf{X}(u{-})$ if $X_j^c(u)\neq X_j^c(u{-})$ or $X_j^p(u)\neq X_j^p(u{-})$ for at least one $1\leq j\leq r$ . Therefore, for each $t\in [0,T]$ one obtains
Moreover, since $L(\textbf{Q})<\infty$ , one gets that $J\big(Q_{j}^c\big)<\infty$ and $J\big(Q_{j}^p\big)<\infty$ for all $1\leq j\leq r$ . Hence, applying [Reference Borkar and Sundaresan12, Lemma 5.7] to each of the $X_j^c$ and $X_j^p$ with respective marginal distributions $Q_{j}^c$ and $Q_{j}^p$ gives us that, for each $1\leq j\leq r$ ,
and
The next lemma establishes the continuity of the projection $\pi$ , which is needed for the continuity of the function $h(\textbf{Q})$ .
Lemma 6.4. Let $\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z}))$ be equipped with its usual weak topology and let $\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))$ be equipped with the metric
where $\rho_0(\cdot,\cdot)$ is a metric on $\mathcal{M}_1(\mathcal{Z})$ which generates the weak topology $\sigma(\mathcal{M}_1(\mathcal{Z}),C_b(\mathcal{Z}))$ . Moreover, let $(\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z})))^{2r}$ be endowed with the product topology induced by the product metric. Equivalently, let $(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z})))^{2r}$ be equipped with the product topology obtained from the product metric $\rho^{2r}_T=\max\{\rho_{T},\ldots,\rho_{T}\}$ . Then the projection
is continuous at each $\textbf{Q}\in\left(\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z}))\right)^{2r}$ where $L(\textbf{Q})<\infty$ .
Proof. The statement of our lemma resembles the statement of [Reference Léonard49, Lemma 2.8]. The difference here is that our spaces of interest are the product spaces $(\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z})))^{2r}$ and $(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z})))^{2r}$ endowed with the product metrics. Moreover, the rate J(Q) in [Reference Léonard49, Lemma 2.8] is here replaced by $L(\textbf{Q})$ . Therefore, if we replace the norm $|\cdot|$ by the product norm $\|\cdot\|$ adapted to the context of our product spaces, then the proof of our lemma follows verbatim the proof of [Reference Léonard49, Lemma 2.8], provided that we can prove [Reference Léonard49, Equation (2.14)]. This is done in Lemma 6.3. Thus, the proof is complete.
We next prove the continuity of the functions $h^N$ .
Lemma 6.5. The functions $h^N\,:\,(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}\rightarrow\mathbb{R}$ defined at (109) are continuous at any $\textbf{Q}$ such that $L(\textbf{Q})<\infty$ .
Proof. This is a generalization of [Reference Léonard49, Lemma 2.9]. For any $\textbf{Q}\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , define
Note that the function $h^N$ given by (109) can be rewritten using the functions $\theta_{\textbf{Q}}^{j,c}(x)$ , $\theta_{\textbf{Q}}^{j,p}(x)$ , $\gamma_{\textbf{Q}}^{j,c}(x)$ , and $\gamma_{\textbf{Q}}^{j,p}(x)$ as follows:
Therefore, to show the continuity of $h^N(\textbf{Q})$ , we show that for any $1\leq j\leq r$ , the functions
are continuous at any $\textbf{Q}$ where $L(\textbf{Q})<\infty$ . First, from Assumption 6.1, there exists a positive constant $C>0$ such that, for each $1\leq j\leq r$ ,
and
Similarly, by Assumption 6.1 we have that, for each $1\leq j\leq r$ ,
and
Take $\textbf{Q}'\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ in the neighborhood of $\textbf{Q}$ . Note that
In addition, for each $1\leq j\leq r$ , the following inequalities hold:
The idea now is to control the right-hand sides of the last four inequalities. We show this for the inequality in (148). Similar arguments can be used to treat the other three inequalities. First, notice that the function $ \theta_{\textbf{Q}}^{j,c}$ is continuous. Indeed, the topology of $\mathcal{X}$ is built so that the function $x\rightarrow\sum_{0\leq t\leq T}{\unicode{x1D7D9}}_{x_t\neq x_{t-}}$ is continuous. Moreover, from Assumption 6.1, the functions $\lambda_{j,z,z'}^{c}$ are continuous. Furthermore, from Lemma 6.4, the component projection $Q_j^c\rightarrow\pi(Q_j^c)=(Q_{j}^c(t))_{0\leq t\leq T}$ is continuous since $\pi(\textbf{Q})=(\textbf{Q}(t))_{0\leq t\leq T}$ is continuous. Finally, the $\log$ function being continuous gives that $ \theta_{\textbf{Q}}^{j,c}$ is continuous provided that $L(\textbf{Q})<\infty$ . In addition, from (143) we have that $\theta_{\textbf{Q}}^{j,c}\leq C(1+\varphi)$ ; thus $\theta_{\textbf{Q}}^{j,c}\in C_{\varphi}(\mathcal{X})$ provided that $L(\textbf{Q})<\infty$ . Therefore, the term $ \Big|\langle \theta_{\textbf{Q}}^{j,c}, Q_{j}^c-Q_{j}'^c \rangle\Big|$ can be made as small as desired by taking $\textbf{Q}'$ close enough to $\textbf{Q}$ (and thus $Q_{j}'^c$ close enough to $Q_{j}^c$ ). The second term in the right-hand side of (148) is bounded as follows:
Therefore, using again Assumption 6.1, Lemma 6.4, and the continuity of the $\log$ function, the right-hand side of (152) is controlled for any $\textbf{Q}'$ in the neighborhood of $\textbf{Q}$ in $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ provided that $L(\textbf{Q})<\infty$ . Thus, the integral $\textbf{Q}\rightarrow\int\theta_{\textbf{Q}}^{j,c} dQ_{j}^c$ is continuous. The same steps allow us to show that
are also continuous at any $\textbf{Q}$ where $L(\textbf{Q})<\infty$ . Hence, the function $h^N$ is a linear combination of continuous functions and thus is continuous, which concludes the proof.
The following lemma states the uniform convergence of $\big\{h^N,N\geq 1\big\}$ towards h.
Lemma 6.6. Suppose Assumption 6.1 holds. Then the sequence of functions $\big\{h^N,N\geq 1\big\}$ introduced in (109) converges uniformly towards the function h given by (110).
Proof. From (110) and (142) one has
First, observe that
Using (143) one obtains
which is $<\infty$ since $Q_j^c$ is a probability measure and the second integral is finite for $Q_j^c\in\mathcal{M}_{1,\varphi}(\mathcal{X})$ . Moreover, by (145) one has
again since $Q_j^c$ is a probability measure. Therefore, by Assumption 6.1, one deduces that
Similarly, one obtains
Thus $h^N$ converges uniformly towards h.
The final step before applying the Laplace–Varadhan principle is to verify that (132) is satisfied.
Lemma 6.7. For any $\alpha>0$ ,
Proof. First note that, using the bounds (143), (144), (145), and (146), we find that for all $\textbf{Q}\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ ,
Therefore, to show (132), it is enough to show that, for any $\alpha>0$ ,
Recall that $P_{\nu^N}^{0, N}=\mathbb{P}_{z^N}^{0, N}\circ G_N^{-1}$ , where $\mathbb{P}_{z^N}^{0, N}=\otimes_{n=1}^NP_{z_n}$ and $P_{z_n}$ is the law of the nth particle in the case of non-interaction, with the initial condition being $z_n$ . Hence, by independence, the integral term in the left-hand side of (155) is equivalent to
Now, using [Reference Léonard49, Lemma 2.10], we find that for all $1\leq j\leq r$ ,
and
Since $N_j^c<N$ and $N_j^p<N$ for all $1\leq j\leq r$ , (157), (158), and (156) lead to (155), which concludes the proof.
The interacting case. We are now ready to apply the Laplace–Varadhan principle to the sequence of probability measures $\big\{P^{0,N}_{\nu_N},N\geq 1\big\}$ . By Lemma 6.1, the sequence $\big\{P^{0,N}_{\nu_N},N\geq 1\big\}$ obeys a large deviations principle in the topological space $(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ with rate function $L(\textbf{Q})$ defined by (117), and with speed N. Moreover, by Lemma 6.5, the functions $h^N$ defined in (109) are continuous at any $\textbf{Q}$ such that $L(\textbf{Q})<\infty$ . Furthermore, by Lemma 6.2, the functions $h^N$ are continuous on the set $\big\{\textbf{Q}\in(\mathcal{M}_{1}(\mathcal{D}([0,T],\mathcal{Z}))^{2r}|L(\textbf{Q})<\infty\big\}$ . Therefore, the conditions in (128), (129), and (131) hold true. Finally, we have seen in Lemma 6.7 that (132) is satisfied. Hence, a straightforward application of [Reference Varadhan64, Theorem 3.4] gives
as $N\rightarrow\infty$ , and the sequence
obeys a large deviations principle with speed N and rate function
Now, from (108) we have
Since $P_{\nu^N}^{N}$ is a probability measure we obtain
Thus, the left-hand side of (159) is always zero and so
which gives that
We then conclude that the sequence $\big\{P_{\nu^N}^{N},N\geq 1\big\}$ obeys a large deviations principle in the topological space $(\mathcal{M}_{1,\varphi} (\mathcal{X}))^{2r}$ , with speed N and rate function
In order to obtain the representation in (119), we proceed as follows. First, from (154) we have that, for $\textbf{Q}\in(\mathcal{M}_{1,\varphi}(\mathcal{X}))^{2r}$ , $h(\textbf{Q})<\infty$ . Moreover, from [Reference Borkar and Sundaresan12, Lemma 5.6], the functions $J^{j,\iota}(Q)$ defined by (118) have the following representation:
where $H\big(Q\big|P_{j,\iota}\big)$ is the relative entropy defined by (116). Therefore, if either $Q\circ\pi_0^{-1}\neq\nu^{\,j,\iota}$ or Q is not absolutely continuous with respect to $P_{j,\iota}$ , then one can immediately observe that $J^{j,\iota}(Q)=\infty$ ; thus $L(\textbf{Q})=\infty$ , and finally $I(\textbf{Q})=\infty$ . Now, assume that for all $1\leq j\leq r$ we have $Q_{j}^c\circ\pi_0^{-1}=\nu^{\,j,c}$ , $Q_{j}^c\ll P$ , and $Q_{j}^p\circ\pi_0^{-1}=\nu^{\,j,p}$ , $Q_{j}^p\ll P$ ; then
Furthermore, one can observe from (105) that the densities $\exp\big\{h_1(x,\eta,\rho_j)\big\}$ and $\exp\big\{h_2(x,\eta,\rho_1,\ldots,\rho_r)\big\}$ do not depend on the initial condition $z_0$ . Therefore, for each $1\leq j\leq r$ , the densities of $R^c\big(\pi\big(Q_{j}^c\big),\pi\big(Q_{j}^p\big)\big)$ and $R^p\big(\pi\big(Q_{j}^c\big)\big),\pi\big(Q_{1}^p\big),\ldots,\pi\big(Q_{r}^p\big)\big)\big)$ with respect to the mixtures $P_{j,c}$ and $P_{j,p}$ are given by, respectively,
and
Replacing $h_1(\cdot)$ and $h_2(\cdot)$ in (109) by the two representations above, we find
Finally, using (99) we find that, as $N\rightarrow\infty$ ,
This concludes the proof.
6.2. Large deviations principle for the empirical process
We now investigate the large deviations of the sequence $\big(p_{\nu_N}^N,N\geq 1\big)$ where, for any $N\geq 1$ , $p_{\nu_N}^N=\mathbb{P}_{z^N}^N\circ\gamma_N^{-1}=\pi(\mathbb{M}^N)$ is the distribution of the $(\mathcal{M}_1(\mathcal{Z}))^{2r}$ -valued empirical process defined by
The flow $\mu^N$ takes values in the product space $\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ . Again let $\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))$ be equipped with the metric
where $\rho_0(\alpha,\beta)$ , $\alpha,\beta\in\mathcal{M}_1(\mathcal{Z})$ , is a metric on $\mathcal{M}_1(\mathcal{Z})$ that generates the weak topology on $\mathcal{M}_1(\mathcal{Z})$ . Moreover, let the product space $\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ be equipped with the product topology induced by the product metric $\rho^{2r}_T=\max\{\rho_{T},\ldots,\rho_{T}\}$ .
For any $\xi=\big(\xi_1^c,\xi_1^p,\ldots,\xi_r^c,\xi_r^p\big)\in\big(\mathcal{M}_1(\mathcal{Z})\big)^{2r}$ , define the rate matrices
where
and
From the laws of large numbers given in Corollary 5.1, one can deduce that, as $N\rightarrow\infty$ , the sequence $\big(\mu^N, N\geq 1\big)$ converges weakly, for converging initial conditions, towards the solution $\mu$ of the following McKean–Vlasov system:
where $A^*$ is the adjunct/transpose of the matrix A and $\dot{\mu}(t)= \frac{\partial}{\partial t} \mu(t)$ . Note that the Lipschitz property of the functions $\lambda_{j,z,z'}^{c}$ and $\lambda^p_{j,z,z'}$ ensures that (172) is well-posed. Also, one can notice that the representation (172) is consistent with the infinitesimal generators $\mathcal{L}^c_{\xi,\eta_j}$ and $\mathcal{L}_{\xi,\eta_1,\dots,\eta_r}^p$ introduced in (103) and (104). Indeed, if we consider $\phi$ , $\mathcal{L}^c_{\xi,\eta_j}\phi$ , and $\mathcal{L}_{\xi,\eta_1,\dots,\eta_r}^p\phi$ as column vectors, then the right-hand sides of (103) and (104) are the results of right-multiplying the rates matrices $A^{j,c}$ and $A^{j,p}$ , respectively, by the vector $\phi$ .
Denote by $\tau$ the log-Laplace transform of the centered Poisson distribution with parameter 1 given by $\tau(u)=e^u-u-1$ , and let $\tau^*$ be its Legendre transform, defined by
Let us recall now the notion of absolute continuity introduced in [Reference Dawson and Gärtner24, Definition 4.1]. Denote by $\mathcal{S}$ the Schwartz space of test functions $\mathbb{R}^d\rightarrow\mathbb{R}$ having compact support and possessing continuous derivatives of all orders. We endow $\mathcal{S}$ with the usual inductive topology. Let $\mathcal{S}'$ be the corresponding space of real distributions. For each compact set $K\subset\mathbb{R}^d$ , $\mathcal{S}_K$ will denote the subspace of $\mathcal{S}$ consisting of all test functions whose support is contained in K. Finally, let $\langle\nu,f\rangle$ denote the application of the test function f to the distribution $\nu$ .
Definition 6.1. Let I be an interval of the real line. A map $\nu (\cdot)\,:\, I\rightarrow\mathcal{S}'$ is called absolutely continuous if, for each compact set $K\subset\mathbb{R}^d$ , there exists a neighborhood $U_K$ of 0 in $\mathcal{S}_K$ and an absolutely continuous function $H_K\,:\, I\rightarrow{R}$ such that
for all $u,v\in I$ and $f\in U_K$ .
Finally, for any $\theta\in\mathcal{M}(\mathcal{Z})$ , define
Also let us introduce, for each $\nu\in(\mathcal{M}_1(\mathcal{Z}))^{2r}$ , and according to [Reference Dawson and Gärtner24, Equation (4.9)], the functional $S(\mu|\nu)$ defined from $(\mathcal{D}([0, T ], \mathcal{M}_1 (Z)))^{2r}$ to $[0,\infty]$ by setting
if $\mu(0)=\nu$ and $\mu_j^c,\mu_j^p$ are absolutely continuous in the sense of Definition 6.1 for all $1\leq j\leq r$ , and $S_{[0,T]}(\mu|\nu)=+\infty$ otherwise.
We are now ready to state our large deviations principle for the sequence $\big(p^N_{\nu_N}, N\geq 1\big)$ .
Theorem 6.2. Suppose that $\nu_N\rightarrow\nu$ weakly. The sequence of probability measures $\big(p^N_{\nu_N},N\geq 1\big)$ obeys a large deviations principle in the space $\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ , with speed N, and rate function $S_{[0,T]}(\mu|\nu)$ given by (173).
Moreover, if a path $\mu\in\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ satisfies $S_{[0,T]}(\mu|\nu)<\infty$ , then $\mu_j^c$ and $\mu_j^p$ are absolutely continuous and there exist families of rate matrices $L_{j,c}(t)=\Big(l_{z,z'}^{j,c}(t), (z, z')\in\mathcal{E}\Big)$ and $L_{j,p}(t)=\Big(l_{z,z'}^{j,p}(t), (z, z')\in\mathcal{E}\Big)$ such that, for all $1\leq j\leq r$ and $t \in [0, T ]$ ,
Furthermore, in this case, the good rate function $S_{[0,T]}(\mu|\nu)$ is given by
Proof. We first use a contraction argument to derive a large deviations principle for the sequence $\big(p_{\nu_N}^N,N\geq 1\big)$ . From Theorem 6.1, the sequence $\big(P_{\nu_N}^N,N\geq 1\big)$ obeys a large deviations principle with speed N and rate function $I(\textbf{Q})$ given by
Moreover, from Lemma 6.4, the projection
is continuous at each $\textbf{Q}\in\big(\mathcal{M}_1(\mathcal{D}([0,T],\mathcal{Z}))\big)^{2r}$ where $L(\textbf{Q})<\infty$ , and thus at any $\textbf{Q}$ such that $I(\textbf{Q})<\infty$ . The latter corresponds to the effective domain $\mathcal{D}_I=\{\textbf{Q}\,:\,I(\textbf{Q})<\infty\}$ of the rate function I (see [Reference Dembo and Zeitouni29, p. 4]). Therefore, by applying the contraction principle to the large deviations principle of $\big(P_{\nu_N}^N,N\geq 1\big)$ (see [Reference Dembo and Zeitouni29, Theorem 4.2.1, Remark (c)]) with rate I, we deduce that the family of probability measures $\big(P_{\nu_N}^N\circ\pi^{-1},N\geq 1\big)$ obeys a large deviations principle in $\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ with the rate function defined, for any $\mu\in\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ , by
We now derive another representation for the rate function V following [Reference Dawson and Gärtner24, Reference Léonard49]. Fix $\mu=\big(\mu_j^c,\mu_j^p,1\leq j\leq r\big)\in\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ . Note that writing $\pi(\textbf{Q})=\mu$ , with $\textbf{Q}\in\big(\mathcal{M}_1(D([0,T],\mathcal{Z}))\big)^{2r}$ , is equivalent to $\pi\big(Q_{j}^c\big)=\mu_j^c$ and $\pi\big(Q_{j}^p\big)=\mu_j^p$ for all $1\leq j\leq r$ . Therefore V can be rewritten as
Fix $1\leq j\leq r$ . Let $\left(X^{(i)}_{j,c}\right)_{i\geq 1}$ and $\left(X^{(i)}_{j,p}\right)_{i\geq 1}$ be sequences of i.i.d. processes with common laws $R^c\big(\mu_j^c,\mu_j^p\big)$ and $R^p\big(\mu_j^c,\mu_1^p,\ldots, \mu_r^p\big)$ , respectively. By Sanov’s theorem, the empirical measures
obey large deviations principles as $N_j^c\rightarrow\infty$ and $N_j^p\rightarrow\infty$ , with speeds $N_j^c$ and $N_j^p$ , respectively, and rate functions given by
and
respectively. Using the same arguments as in the proof of Lemma 6.4, one can show that the projection $\pi$ is continuous at any $\textbf{Q}\in\big(\mathcal{M}_1(D([0,T],\mathcal{Z}))\big)^{2r}$ such that
Thus, the component projections $\pi\big(Q_{j}^c\big)$ and $\pi\big(Q_{j}^p\big)$ are also continuous. Hence, using the contraction principle ([Reference Dembo and Zeitouni29, Theorem 4.2.1]), the sequences
and
obey large deviations principles with speeds $N_j^c$ and $N_j^p$ , respectively, and rate functions
respectively. Note that, by using an independence argument and following the same steps as in the proof of Lemma 6.1, one can show that the sequence
obeys a large deviations principle with speed N and rate function
In addition, the vector
obeys a large deviations principle with rate I(Q). Therefore, by a contraction argument and using again the continuity of the projection, we find that
obeys a large deviations principle with rate $V(\mu)$ . Hence, by the uniqueness of the rate function (cf. [Reference Deuschel and Stroock30, Lemma 2.1.1]), we find
We next derive another representation for $S_{\mu}(\nu)$ . For any $1\leq j\leq r$ and $\nu\in\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))$ , we have from [Reference Léonard49, p. 319] that
where, for all $\tilde{\nu}\in\mathcal{M}_1([0,T[\times\mathcal{Z})$ , $U^{j,c}_{\mu}(\tilde{\nu})$ and $U^{j,p}_{\mu}(\tilde{\nu})$ are given by the following (see [Reference Léonard49, Equation (3.14)]):
where $\mathcal{C}_1^c$ stands for the set of all continuous functions with compact support on $[0,T[ \times\mathcal{Z}$ which are t-differentiable. Using (177), (178), and (179) together with [Reference Léonard49, Lemma 3.2], we obtain
Finally, using (176), we deduce that $\big(p^N_{\nu_N},N\geq 1\big)$ obeys a large deviations principle with rate N and good rate function (173). The representation (174) follows immediately from [Reference Léonard49, Lemma 3.2], and the statement about absolute continuity follows from [Reference Léonard49, Theorem 3.1]. The theorem is proved.
The following result shows that the large deviations principle for $\big(p_{\nu}^{N},N\geq 1\big)$ holds uniformly in the initial condition.
Corollary 6.1. For any compact set $K\subset\big(\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ , any closed set $F\subset\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ , and any open set $G\subset\big(\mathcal{D}([0,T],\mathcal{M}_1(\mathcal{Z}))\big)^{2r}$ , we have
Proof. This follows immediately from [Reference Dembo and Zeitouni29, Corollary 5.6.15] and Theorem 6.2.
Acknowledgements
We would like to thank the anonymous referees and the associate editor for having read the paper with great care and made several very important comments that improved the exposition.
Funding information
This research was supported by the Natural Sciences and Engineering Research Council of Canada Discovery Grants and by Carleton University.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.