On affine multicolor urns grown under multiple drawing

Joshua Sparks; Markus Kuba; Srinivasan Balaji; Hosam Mahmoud

doi:10.1017/jpr.2024.101

On affine multicolor urns grown under multiple drawing

Part of: Limit theorems Stochastic processes Combinatorial probability

Published online by Cambridge University Press: 24 December 2024

Joshua Sparks ,

Markus Kuba ,

Srinivasan Balaji and

Hosam Mahmoud

Show author details

Joshua Sparks*: Affiliation:
The George Washington University
Markus Kuba*: Affiliation:
FH Technikum Wien
Srinivasan Balaji*: Affiliation:
The George Washington University
Hosam Mahmoud*: Affiliation:
The George Washington University
*: * Postal address: Department of Statistics, Washington, D.C. 20052, USA
*** Postal address: Department of Applied Mathematics and Physics, Vienna, Austria. Email: [email protected]
* Postal address: Department of Statistics, Washington, D.C. 20052, USA
* Postal address: Department of Statistics, Washington, D.C. 20052, USA

Article contents

Abstract
Introduction
The mechanics of a multicolor multiple-drawing urn scheme
The (k, s, b)-urn
Classification of cores
A multivariate martingale underlying the (k, s, b)-urn class
Comments on large-index urns
Growth under sampling with replacement
Examples
Funding information
Competing interests
References

Rights & Permissions

Abstract

Early investigation of Pólya urns considered drawing balls one at a time. In the last two decades, several authors have considered multiple drawing in each step, but mostly for schemes involving two colors. In this manuscript, we consider multiple drawing from urns of balls of multiple colors, formulating asymptotic theory for specific urn classes and addressing more applications. The class we consider is affine and tenable, built around a ‘core’ square matrix. We examine cases where the urn is irreducible and demonstrate its relationship to matrix irreducibility for its core matrix, with examples provided. An index for the drawing schema is derived from the eigenvalues of the core. We identify three regimes: small, critical, and large index. In the small-index regime, we find an asymptotic Gaussian law. In the critical-index regime, we also find an asymptotic Gaussian law, albeit with a difference in the scale factor, which involves logarithmic terms. In both of these regimes, we have explicit forms for the structure of the mean and the covariance matrix of the composition vector (both exact and asymptotic). In all three regimes we have strong laws.

Keywords

Pólya urn combinatorial probability limiting distribution multivariate central limit theorem multivariate martingale

MSC classification

Primary: 60F05: Central limit and other weak theorems

Secondary: 60C05: Combinatorial probability 60G46: Martingales and classical analysis

Type: Original Article
Information: Journal of Applied Probability , First View , pp. 1 - 24

DOI: https://doi.org/10.1017/jpr.2024.101 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Pólya urns are popular modeling tools, as they capture the dynamics of many real-world applications involving change over time. Numerous applications in informatics and biosciences are listed in [Reference Mahmoud24]; see also [Reference Johnson and Kotz15, Reference Kotz and Balakrishnan19] for a wider perspective.

In the early models, single balls are drawn one at a time. In the last two decades, several authors have generalized the models to schemes evolving on multiple drawing; see [Reference Bose, Dasgupta and Maulik3, Reference Chen and Kuba5–Reference Crimaldi, Louis and Minelli7, Reference Kuba and Mahmoud21–Reference Lasmar, Mailler and Selmi23, Reference Mahmoud25]. However, most of these investigations deal only with urn schemes involving two colors. So far, explicit expressions and asymptotic expansions for the expected value, the variance and covariances, and higher moments have proven to be elusive in the multicolor case.

In this manuscript, we present a more comprehensive approach in which there are multiple colors in the urn, and we sample several balls in each drawing. Depending on the composition of the sample, action is taken to place balls in the urn. We ought to mention that this is a new trend in investigating multicolor urns; little exists in the literature beyond the study of urn schemes with an infinite number of colors, where the authors imposed restrictions on the number of colors to add after each drawing [Reference Bandyopadhyay and Thacker2, Reference Janson13].

As a modeling tool, multicolor multiple-drawing urns have their niche in applications to capture the dynamics in systems that internally have multiple interacting parts. For instance, the Maki–Thompson rumor spreading model [Reference Maki and Thompson26] can be modeled by an urn scheme on three colors (representing ignorants, spreaders, and stiflers of the rumor) experiencing pair interactions, such as social visits and phone calls (represented as draws of pairs of balls at a time). Another application is to produce a profile of the nodes in random hypergraphs; see [Reference Sparks29, Chapter 6], and its prelude [Reference Sparks, Balaji and Mahmoud30].

2. The mechanics of a multicolor multiple-drawing urn scheme

We first describe a multicolor multiple-drawing urn model qualitatively. We consider an urn growing on k colors, which we label $1,\ldots, k$ . Initially, the urn contains a positive number of balls of colors $i\in [k]$ (some of the colors may initially be absent), where [k] stands for the set $\{1, 2, \ldots, k\}$ . At each discrete time step, a sample of balls is drawn out of the urn (the sample is taken without replacement). Note that we speak of the entire sample as one draw. After the sample is collected, we put it back in the urn. Depending on what appears in the sample, we add balls of various colors. Such models will be called multicolor multiple-drawing schemes.

Many such schemes can emerge upon specifying a particular growth algorithm (how many balls to draw, what to add, etc.). We can also develop a model based on sampling with replacement. The proof techniques and results for such a sampling scheme are very similar to those that appear in a sampling scheme without replacement. We devote Section 7 to outline results under sampling with replacement. We refer readers interested in more details to [Reference Sparks29].

A sample of size s is a partition of the number s into nonnegative integers. Each part of the partition corresponds to the number of balls in the sample from a particular color after drawing the sample. In other words, we view a sample as a row vector $\textbf{{s}}$ with nonnegative components $s_1, \ldots, s_k$ . The components of $\textbf{{s}}$ are the solutions of the Diophantine equation $s_1 + \cdots + s_k = s$ in nonnegative integers. There are $\binom{s +k-1}{s}$ such solutions.

In a very general model, when the sample $\textbf{{s}}$ is drawn, we add $m_{\textbf{{s}},j}$ balls of color j, for $j=1, \ldots, k$ , where $m_{\textbf{{s}},j}$ can be a random variable. We only consider the case of fixed additions: $m_{\textbf{{s}}, j}\in \mathbb Z$ , the class of urn schemes where the sample size (in each drawing) is a positive fixed number s (not changing over time), and the total number of balls added after each drawing is also a fixed number b. Such an urn is said to be balanced, and b is called the balance factor. Further specification will follow while crafting conditions for linear recurrence.

We canonically rank the various samples through an application of [Reference Knuth17, Algorithm L], called reverse lexicographic order [Reference Konzem and Mahmoud18]. In this scenario, the sample $(x_1, \ldots, x_k)$ precedes the sample $(y_1, \ldots, y_k)$ if, for some $r \in [k]$ , we have $x_i = y_i$ for $i=1, \ldots, r-1$ , and $x_r > y_r$ . For example, drawing samples of size $s=3$ from an urn with $k=3$ colors, in this modification of the canonical ranking the sample (3,0,0) precedes the sample (2,1,0), which in turn precedes the sample (2, 0, 1).

Our interest is in urns that can endure indefinite drawing. That is, no matter which stochastic path is followed, we can draw ad infinitum and are able to execute the addition rules after each drawing.

We call such a scheme tenable. For instance, if all the numbers $m_{\textbf{{s}}, j}$ , for all feasible $\textbf{{s}}$ and $j=1, \ldots, k$ , are positive, the scheme is tenable.

3. The (k, s, b)-urn

Consider a k-color multiple-drawing tenable balanced urn scheme grown by taking samples of a fixed size $s\ge 1$ , with balance factor $b \ge 1$ . We call such a class (k, s, b)-urns. Let $X_{n,i}$ be the number of balls of color $i\in [k]$ in the urn after $n\ge 0$ draws, and let $\textbf{X}_n$ be the row vector with these components. We call $\textbf{X}_n$ the composition vector. Thus, initially, there are $X_{0,i}\ge 0$ balls of color i, for $i=1,\ldots, k$ . Since the scheme is tenable, we must enable a first draw by having $\sum_{i=1}^k X_{0,i}\ge s$ . Let $\tau_n$ be the total number of balls after n draws. For a balanced urn, we have $\tau_n = \tau_{n-1} + b = bn + \tau_0$ . To study (k, s, b)-urns, we first need to set up some notation.

3.1. Notation

We use the notation $1_{\mathcal E}$ for the indicator of event $\mathcal E$ . For integer $r\ge 0$ and $z\in \mathbb C$ , the falling factorial $z(z-1) \cdots(z-r+1)$ is denoted by $(z)_r$ , and $(z)_0$ is to be interpreted as 1. For a row vector $\textbf{{s}} = (s_1, \ldots, s_k)$ , with $s_i\ge 0$ (for $i\in [k]$ ) and $\sum_{i=1}^k s_i = s$ , the multinomial coefficient $\binom{s}{s_1, \ldots, s_2}$ may be written as $\binom{s}{\textbf{{s}}}$ , and the concatenation $(y_1)_{s_1}(y_2)_{s_2}\cdots(y_k)_{s_k}$ of falling factorials may be succinctly written as $(\textbf{y})_\textbf{s}$ for economy, where $\textbf{y} = (y_1, \ldots, y_k)$ .

There is a multinomial theorem for falling factorials, which expands a falling factorial of a sum in terms of the falling factorials of the individual numbers in the summand. Namely, for $\textbf{z}=(z_1, \ldots, z_k)\in \mathbb{ C}^k$ , the theorem states that

(1)

\begin{equation}(z_1 +\cdots + z_k)_s = \sum_{\textbf{{s}}} \binom{s}{\textbf{{s}}} (\textbf{z})_\textbf{s}.\end{equation}

This expansion is known as the (multivariate) Chu–Vandermonde identity; the reader can find it in classic books like [Reference Bailey1, Reference Graham, Knuth and Patashnik8].

For probabilistic convergence modes we use ${\buildrel \textrm{D} \over \longrightarrow}$ , $\buildrel \mathbb{P} \over \longrightarrow$ , and $\buildrel \textrm{a.s.} \over \longrightarrow$ to respectively denote convergence in distribution, convergence in probability, and convergence almost surely.

We denote a diagonal matrix with the provided entries $a_1, a_2, \ldots, a_k$ as

\begin{align*}\textbf{diag}(a_1, a_2, \ldots, a_k) =\small\begin{pmatrix} a_1\;\;\;\;\; & 0\;\;\;\;\;&\ldots\;\;\;\;\;&0\;\;\;\;\; \cr 0\;\;\;\;\; &a_2\;\;\;\;\;&\ldots\;\;\;\;\;&0 \cr \vdots\;\;\;\;\;& \vdots\;\;\;\;\;&\ddots\;\;\;\;\;&\vdots\;\;\;\;\; \cr 0\;\;\;\;\;& 0\;\;\;\;\;&\ldots\;\;\;\;\;&a_k\cr\end{pmatrix},\end{align*}

with $\textbf{diag}(\textbf{v})$ representing the diagonal matrix with diagonal entries equal to the components of a vector $\textbf{v}$ .

We employ the standard asymptotic notation o and O. We also need them in a matricial sense, which will be indicated in bold, so $\textbf{o}(n)$ and $\textbf{O}(n)$ indicate a matrix of appropriate dimensions with all its entries being o(n) or O(n).

In the ensuing text, unless otherwise stated, all vectors are row vectors of k components and all square matrices are $k\times k$ . The identity matrix is denoted by $\textbf{I}$ , the row vector of zeros is denoted by $\textbf{0}$ , and the zero matrix is denoted by $\textbf{0}$ . Likewise, the vector of ones is denoted by $\textbf{1}$ .

We define the discrete simplex $\Delta_{k,s}$ to be the collection of all possible samples $\textbf{s}$ or, more precisely,

\begin{equation*} \Delta_{k,s} = \Bigg\{\textbf{{s}} = (s_1, \ldots, s_k) \mid s_i\ge 0, \ i\in[k]; \ \sum_{j=1}^k s_j = s\Bigg\}.\end{equation*}

We arrange the ball addition numbers in a $\binom{s+k-1}{s} \times k$ replacement matrix $\textbf{M} = [m_{\textbf{{s}},j}]_{\textbf{{s}}\in \Delta_{k,s},j\in [k]}$ , where the entry $m_{\textbf{{s}},j}$ is the number of balls of color j that we add to the urn upon drawing the sample $\textbf{{s}}$ . We take the entire row corresponding to $\textbf{{s}}$ as a row vector denoted by $\textbf{{m}}_\textbf{{s}} =(m_{\textbf{{s}}, 1},m_{\textbf{{s}},2}, \ldots ,m_{\textbf{{s}},k})$ .

When we deal with the eigenvalues $\lambda_1, \ldots, \lambda_k$ of a $k\times k$ matrix, we arrange them in decreasing order of their real parts. In other words, we consider them indexed in the fashion $\textrm{Re}\;\lambda_1\ge\textrm{Re}\;\lambda_2\ge\cdots\ge\textrm{Re}\;\lambda_k$ . In the case that $\lambda_1$ is of multiplicity 1, we call it the principal eigenvalue. The associated eigenvector $\textbf{v}_1$ is dubbed the principal left eigenvector.

The replacement matrix $\textbf{M} = [\textbf{{m}}_\textbf{{s}}]_{\textbf{{s}}\in \Delta_{k,s}}$ is obtained by stacking up the row vectors $\textbf{{m}}_\textbf{{s}}$ for $\textbf{{s}}\in\Delta_{k,s}$ from canonically smallest to largest (that is, the smallest canonical sample labels the bottom row of $\textbf{M}$ and the largest canonical sample labels the top row of $\textbf{M}$ ).

Example 1. Consider a (3,3,9)-urn scheme. The replacement matrix shown below is $10\times 3$ , filled with some numbers consistent with the chosen parameters – the sum across each row is the balance factor $b=9$ . The rows are labeled to the left with the partitions of $s=3$ into $k=3$ components listed (from top to bottom) in decreasing canonical order. For example, for $\textbf{{s}}=(2,0,1)$ , the entry $m_{\textbf{{s}},3}$ is 5.

\begin{equation*} \textbf{M} = \begin{matrix} 300 \cr 210 \cr 201 \cr 120 \cr 111 \cr 102 \cr 030 \cr 021 \cr 012 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \cr 4\;\;\;\;\; & 2\;\;\;\;\; & 3 \cr 2\;\;\;\;\; & 2\;\;\;\;\; & 5 \cr 5\;\;\;\;\; & 1\;\;\;\;\; & 3 \cr 3\;\;\;\;\; & 1\;\;\;\;\; & 5 \cr 1\;\;\;\;\; & 1\;\;\;\;\; & 7 \cr 6\;\;\;\;\; & 0\;\;\;\;\; & 3 \cr 4\;\;\;\;\; & 0\;\;\;\;\; & 5 \cr 2\;\;\;\;\; & 0\;\;\;\;\; & 7 \cr 0\;\;\;\;\; & 0\;\;\;\;\; & 9 \end{pmatrix}{.} \end{equation*}

3.2. Tools from linear algebra

For a square matrix $\textbf{A}$ , we let $\sigma(\textbf{A})$ be the spectrum of $\textbf{A}$ (its set of eigenvalues). We use the Jordan decomposition of $\textbf{A}$ to produce a direct sum $\bigoplus_\lambda\textbf{E}_\lambda$ of general eigenspaces $\textbf{E}_\lambda$ that decomposes $\mathbb{C}^k$ , where $\textbf{A}-\lambda\textbf{I}$ is a nilpotent operator on $\textbf{E}_\lambda$ for all $\lambda\in\sigma(\textbf{A})$ .

Using [Reference Nomizu27, Theorem 7.6], this result implies that, for each $\lambda\in\sigma(\textbf{A})$ , there exists a projection matrix $\textbf{P}_\lambda$ that commutes with $\textbf{A}$ and satisfies the conditions

\begin{equation*} \sum_{\lambda\in\sigma(\textbf{A})}\textbf{P}_\lambda = \textbf{I}, \qquad \textbf{A}\textbf{P}_\lambda = \textbf{P}_\lambda\textbf{A} = \lambda\textbf{P}_\lambda + \textbf{N}_\lambda,\end{equation*}

where $\textbf{N}_\lambda = \textbf{P}_\lambda \textbf{N}_\lambda=\textbf{N}_\lambda \textbf{P}_\lambda$ is nilpotent. Furthermore, these projection matrices satisfy $\textbf{P}_{\lambda_i} \textbf{P}_{\lambda_j} = \textbf{0}$ , when $\lambda_i \ne \lambda_j$ . We define $\nu_\lambda \ge 0$ to be the integer such that $\textbf{N}_\lambda^{\nu_\lambda} \ne \textbf{0}_{k \times k}$ but $\textbf{N}_\lambda^{\nu_\lambda+1} = \textbf{0}$ .

Remark 1. In the Jordan normal form of $\textbf{A}$ , the largest Jordan block with $\lambda$ has size $(\nu_\lambda+1)$ and thus $(\nu_\lambda+1)$ is at most equal to the multiplicity of $\lambda$ . If the algebraic multiplicity of $\lambda$ is $\mathcal{M}_\lambda$ , we can determine the size of the Jordan blocks for $\lambda$ by analyzing the dimension of the nullity of $(\textbf{A}-\lambda\textbf{I})^i$ for $i=1,\ldots,\mathcal{M}_\lambda$ . If $\nu_\lambda=0$ for all $\lambda\in\sigma(\textbf{A})$ , then $\textbf{A}$ is diagonalizable.

3.3. The affine subclass

Suppose $\mathbb{F}_n$ is the sigma field generated by the first n samples. For a general scheme, the conditional expectation $\mathbb{E}[\textbf{X}_n\mid\mathbb{F}_{n-1}]$ may contain combinations of $\textbf{X}_{n-1}$ corresponding to higher moments of the counts in $\textbf{X}_{n-1}$ , such as $\mathbb{E}[\textbf{X}_{n-1}^\top\textbf{X}_{n-1}]$ , of all mixed second moments.

We call an urn scheme affine if, for each $n\ge 0$ , the conditional expectation of $\textbf{X}_n$ takes the form $\mathbb{E}[\textbf{X}_n\mid\mathbb{F}_{n-1}] = \textbf{X}_{n-1}\textbf{C}_n$ for some real-valued $k\times k$ matrix $\textbf{C}_n$ , thus keeping all unconditional expectations linear. This condition aids the mathematical tractability of the model.

To guarantee affinity, certain conditions must be met. The following theorem specifies necessary and sufficient conditions for affinity. These conditions are expressed in terms of the vectors $\textbf{e}_1,\ldots,\textbf{e}_k$ , which are the basis of a k-dimensional vector space where, for $i\in [k]$ , we have

$$\textbf{e}_i = (\underbrace{0,0,\ldots,0}_{i-1\ \text{times}},1,\underbrace{0,0,\ldots,0}_{k-i\ \text{times}}).$$

The affinity will be built around the $k\times k$ submatrix of $\textbf{M}$ corresponding to the rows labeled $s\textbf{e}_i$ , for $i \in [k]$ . This submatrix is filled with a set of integers that are balanced (that is, the sum across any row in the submatrix adds up to b). We call this square submatrix the core, and denote it by $\textbf{A}$ ; we refer to the ith row in $\textbf{A}$ as $\textbf{a}_{s\textbf{e}_i}$ (which is also $\textbf{{m}}_{s\textbf{e}_i}$ ). To ensure tenability, we assume all the entries of $\textbf{A}$ are nonnegative, except possibly the diagonal elements – if one such element is negative, we assume it must be greater than or equal to $-s$ . The initial number $\tau_{0}$ must be at least s, too.

Remark 2. This class may be expanded to cases where a diagonal element is less than $-s$ , given specific starting conditions and column configurations for $\textbf{A}$ are met.

A version of the following theorem appears in [Reference Kuba20].

Theorem 1. A (k,s,b)-urn with core $\textbf{A}$ is affine if and only if

$$\textbf{{m}}_{\textbf{{s}}} = \sum_{i=1}^k\frac{s_i}{s}\textbf{a}_{s\textbf{e}_i} = \frac{1}{s}\textbf{{s}}\textbf{A}$$

for all $\textbf{{s}}\in \Delta_{k,s}$ .

Proof. Recall that for a vector in the basis $\textbf{e}_i = (e_1,\ldots,e_k)$ all the components are 0 except for $e_i$ , which is 1. By construction, in the core we have the trivial relationship

\begin{equation*} \textbf{{m}}_{s\textbf{e}_i} = \textbf{a}_{s\textbf{e}_i} = \sum_{i=1}^k e_i\textbf{a}_{s\textbf{e}_i} = \textbf{e}_i\textbf{A} \qquad \text{for } i\in [k]. \end{equation*}

Let the sample in the nth draw be recorded in $\textbf{Q}_n$ . The ball addition rules translate into a stochastic recurrence:

(2)

\begin{equation} \textbf{X}_n = \textbf{X}_{n-1} + \textbf{{m}}_{\textbf{Q}_n}, \end{equation}

giving rise to the conditional expectation

\begin{align*} \mathbb{E}[\textbf{X}_n \mid \mathbb{F}_{n-1}] & = \textbf{X}_{n-1} + \sum_{s\in\Delta_{k,s}}\textbf{{m}}_\textbf{{s}} \mathbb{P}(\textbf{Q}_n=\textbf{{s}} \mid \mathbb{F}_{n-1}) \\[5pt] & = \textbf{X}_{n-1} + \sum_{\substack{{0\le s_\ell,\, \ell\in[k]}\\ {s_1+\cdots+s_k=s}}} \textbf{{m}}_\textbf{{s}}\frac{\binom{X_{n-1,1}}{s_1}\cdots\binom{X_{n-1,k}}{s_k}}{\binom{\tau_{n-1}}{s}}. \end{align*}

Assume the scheme is affine. For affinity to hold, $\mathbb{E}[\textbf{X}_n\mid\mathbb{F}_{n-1}]$ should be equated to $\textbf{X}_{n-1}\textbf{C}_n$ for some $k\times k$ matrix $\textbf{C}_n$ , for each $n\ge1$ . This is the same as

\begin{equation*} \textbf{X}_{n-1}(\textbf{C}_n - \textbf{I})(\tau_{n-1})_s = \sum_{\substack{0\le s_\ell,\,\ell\in[k]\\ s_1+\cdots+s_{k}=s}}\textbf{{m}}_\textbf{{s}}\binom{s}{\textbf{{s}}} (X_{n-1,1})_{s_1}\cdots(X_{n-1,k})_{s_k}. \end{equation*}

Using the fact that $\tau_{n-1} = \sum_{i={1}}^k X_{n-1, i}$ , write the latter relation in the form

(3)

\begin{equation} (X_{n-1,1},\ldots,X_{n-1,k})(\textbf{C}_n - \textbf{I})\Bigg(\sum_{i=1}^k X_{n-1,i}\Bigg)_s = \sum_{\substack{0\le s_\ell,\,\ell\in[k]\\ s_1+\cdots+s_k=s}}\textbf{{m}}_\textbf{{s}}\binom{s}{\textbf{{s}}} (\textbf{X}_{n-1})_\textbf{{s}}. \end{equation}

On the left-hand, the highest power of $X_{n-1,i}$ for $i\in [k]$ is $s+1$ , while that on the right-hand side is only s, unless $\textbf{C}_n - \textbf{I}$ rids the left-hand side of these powers. So, $\textbf{C}_n-\textbf{I}$ can be in the form $(1 \tau_{n-1}\textbf{B}_n$ for some matrix $\textbf{B}_n$ that does not depend on the history of composition vectors. Write $\textbf{B}_n = \big[\textbf{b}_{n,1}^\top,\ldots,\textbf{b}_{n,k}^\top\big]$ , where $\textbf{b}_{n,j}^\top$ is the jth column vector of $\textbf{B}_n$ , for $j\in [k]$ . So, we write (3) as

\begin{equation*} \textbf{X}_{n-1}{(\tau_{n-1}-1})_{s-1}\big[\textbf{b}_{n,1}^\top,\ldots,\textbf{b}_{n,k}^\top\big] = \sum_{\substack{0\le s_\ell,\,\ell\in[k]\\ s_1+\cdots+s_{k}=s}}\textbf{{m}}_\textbf{{s}}\binom{s}{\textbf{{s}}} (\textbf{X}_{n-1})_\textbf{{s}}. \end{equation*}

We utilize the multinomial theorem stated in (1). For any $i\in k$ , expand the falling factorial ${(\tau_{n-1}-1)_{s-1}=\big(\big(\sum_{i=1}^k X_{n-1,i} \big) -1\big)_{s-1}}$ as

\begin{align*} & (X_{n-1,1} +\cdots + X_{n-1,i-1} + (X_{n-1,i} -1) + X_{n-1,i+1} + \cdots +X_{n-1,k})_{s-1} \\[5pt] & \qquad\qquad = \sum_{\substack{0 \le r_\ell,\, \ell\in [k]\\ r_1+ \cdots +r_k= s-1}} \binom{s-1}{\textbf{r}} (X_{n-1, i} - 1)_{r_\ell} \prod_{\substack{j=1\\ j\not = i}}^s (X_{n-1, j})_{r_j}. \end{align*}

This being valid for any $i\in [k]$ , we get

\begin{equation*} \textbf{X}_{n-1}\!\!\!\! \sum_{\substack{0 \le r_\ell,\, \ell\in [k]\\ r_1+ \cdots +r_k= s-1}}\!\! \binom{s-1}{\textbf{r}} (X_{n-1, i} - 1)_{r_\ell} \prod_{\substack{j=1\\ j\not = i}}^s (X_{n-1, j})_{r_j} \big[\textbf{b}_{n,1}^\top, \ldots, \textbf{b}_{n,k}^\top\big] = \sum_{\substack{0 \le s_\ell,\, \ell\in [k]\\ s_1+ \cdots +s_k = s}} \!\!\!\! \textbf{{m}}_\textbf{{s}} \binom{s}{\textbf{{s}}}(\textbf{X}_{n-1})_\textbf{{s}}. \end{equation*}

To match coefficients of similar falling factorials, execute the vectorial product on the left:

$$\sum_{\substack{0 \le r_\ell,\, \ell\in [k]\\ r_1+ \cdots +r_k= s-1}} \binom{s-1}{\textbf{r}} \sum_{i=1}^k(\textbf{X}_{n-1})_{\textbf{r} + \textbf{e}_i}\textbf{b}_{n,i}^\top = \sum_{\substack{0 \le s_\ell,\, \ell\in [k]\\ s_1+ \cdots +s_k = s}} \textbf{{m}}_\textbf{{s}} \binom{s}{\textbf{{s}}} (\textbf{X}_{n-1})_\textbf{{s}}.$$

Extracting the coefficient of $(\textbf{X}_{n-1})_\textbf{{s}}$ on both sides, we get

$$\sum_{i=1}^k \binom{s-1}{\textbf{{s}} - \textbf{e}_i} \textbf{b}_{n,i}^\top = \textbf{{m}}_\textbf{{s}} \binom{s}{\textbf{{s}}}.$$

Applying this to a row in the core, where $\textbf{{s}}= s\textbf{e}_j$ for some $j\in [k]$ , we get $\textbf{b}_{n,j}^\top = s \textbf{e}_j$ (as all the multinomial coefficients include negative lower indices, except the jth). It follows that

$$ \textbf{{m}}_\textbf{{s}} = \frac {s_1! \, s_2!\, \cdots s_k!} {s!}\sum_{i=1}^k \binom{s-1}{s_1, \ldots s_{i-1} , s_i-1, s_{i+1}, \ldots, s_k} \textbf{a}_{s\textbf{e}_i} = \sum_{i=1}^k \frac {s_i} s \textbf{a}_{s\textbf{e}_i} .$$

For the converse, assume the affinity condition in the statement of the theorem. Recall that $\textbf{Q}_n$ is the random sample at time n, which has a conditional multi-hypergeometric distribution (given $\textbf{X}_{n-1}$ ) with parameters $\tau_{n-1}$ , $\textbf{X}_{n-1}$ , and s. Now, we write the stochastic recurrence (2) in the form

(4)

\begin{equation} \textbf{X}_n = \textbf{X}_{n-1} + \frac 1 s\textbf{Q}_n\textbf{A}. \end{equation}

The mean of the multi-hypergeometric distribution is well known [Reference Kendall, Stuart and Ord16]. In vectorial form, the conditional mean (given $\textbf{X}_{n-1}$ ) is $(s/{\tau_{n-1}})\textbf{X}_{n-1}$ . We thus have the conditional expectation

(5)

\begin{equation} \mathbb{E}[\textbf{X}_n \mid \mathbb{F}_{n-1}] = \textbf{X}_{n-1} + \frac 1 {\tau_{n-1}} \textbf{X}_{n-1}\textbf{A} = \textbf{X}_{n-1}\bigg(\textbf{I} + \frac 1 {\tau_{n-1}} \textbf{A}\bigg) , \end{equation}

which is an affine form.

Remark 3. It is worth describing (k, s, b)-urn schemes from a qualitative viewpoint. There is a saying that states ‘the whole is equal to the sum of its parts’. Here, each color plays an independent role on how the urn changes within a given step. That is, a ball of color 1 in the sample creates one set of changes, a ball of color 2 in the sample produces another set of changes, and so on. Thus, $\textbf{m}_\textbf{s}$ is merely the sum of the separate impacts of each ball color, with no additional effect produced by a specific sample combination. Furthermore, this class simplifies to traditional study of single-draw urn models when $s=1$ .

Corollary 1.

\begin{equation*} \boldsymbol{\mu}_n := \mathbb{E}[\textbf{X}_n] = \textbf{X}_0\prod_{i=0}^{n-1}\bigg(\textbf{I} + \frac{1}{\tau_i} \textbf{A}\bigg). \end{equation*}

Proof. The corollary follows from direct iteration of (5).

Example 2. The matrix $\textbf{M}$ in Example 1 is affine. It is built from the core

\begin{equation*} \textbf{A}=\small\begin{matrix} 300 \cr 030 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \cr 6\;\;\;\;\; & 0\;\;\;\;\; & 3 \cr 0\;\;\;\;\; & 0\;\;\;\;\; & 9 \end{pmatrix} = \begin{matrix} 3\textbf{e}_1 \cr 3\textbf{e}_2 \cr 3\textbf{e}_3 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \cr 6\;\;\;\;\; & 0\;\;\;\;\; & 3 \cr 0\;\;\;\;\; & 0\;\;\;\;\; & 9 \end{pmatrix}. \end{equation*}

As an instantiation of the computation of a row of $\textbf{M}$ , as given in Theorem 1, take the sample $\textbf{{s}} =(2,0,1)$ ; the entry $\textbf{{m}}_\textbf{{s}}$ in $\textbf{M}$ is obtained as

$$\textbf{{m}}_{(2,0,1)} =\frac 2 3 \textbf{a}_{3\textbf{e}_1} + \frac 0 3 \textbf{a}_{3\textbf{e}_2} + \frac 1 3 \textbf{a}_{3\textbf{e}_3} = \frac 2 3 (3, 3, 3) + \frac 1 3(0, 0, 9) = (2,2,5).$$

4. Classification of cores

There is a large variety of $k\times k$ matrices that can be used as cores to build replacement matrices for urn schemes. As already mentioned, our focus in this paper is on (k, s, b)-urn schemes. Results are more easily obtained when the core of the replacement matrix is irreducible. Loosely speaking, this means that all the colors are essential (or communicating). More precisely, this is defined in terms of dominant colors. Color $i\in [k]$ is said to be dominant when we start with only balls of color i and almost surely balls of color $j\in [k]$ appear at some future point in time, for every $j\not = i$ . An urn scheme is then called irreducible if every color is dominant.

However, for affine (k, s, b)-urn schemes, irreducibility in the above sense implies that its core matrix $\bf{A}$ is irreducible in the matrix sense. The following lemma and example demonstrate this case.

Lemma 1. If an affine (k,s,b)-urn scheme is irreducible, then its core matrix $\bf A$ is irreducible in the matrix sense.

Proof. Consider an affine (k, s, b)-urn scheme that is irreducible. Let $\mathcal{U}$ be the set of all partitions labeling the drawing schema of the urn. On the contrary, assume that its core matrix, denoted by $\bf A$ , is reducible in the matrix sense. Then, there exist a permutation matrix $\bf P$ and positive integer $r <k$ such that

\begin{equation*} \textbf{P}^\top \textbf{A}\textbf{P} = \begin{pmatrix} \textbf{A}_{11}\;\;\;\;\; & \textbf{0}_{r,(k-r)} \cr \textbf{A}_{21}\;\;\;\;\; & \textbf{A}_{22} \end{pmatrix}, \end{equation*}

where $\textbf{A}_{11}$ and $\textbf{A}_{22}$ are square matrices of orders r and $(k-r)$ , respectively. Let $W=\{W_1, W_2, \ldots, W_r\} \subsetneq [k]$ be the colors represented by $\textbf{A}_{11}$ . Then, monocolor samples restricted to colors in W only add colors in W. By the affine nature of our urn scheme, samples which come from the set of partitions $\mathcal{U}_{W} \subsetneq \mathcal U$ , generated from all multiple drawings of size s from colors in W, would never introduce balls of colors outside of W as well. Therefore, given that an urn begins with only balls of color $i \in W$ , no path of samples exists which would produce balls of color $j \not\in W$ ; such a conclusion contradicts the irreducibility of the urn scheme.

Example 3. To illustrate this result, we present two affine urn schemas, represented by replacement matrices $\textbf{M}_1$ and $\textbf{M}_2$ (with core matrices $\textbf{A}_1$ and $\textbf{A}_2$ , respectively). We will determine if a square matrix is irreducible in the matricial sense by whether all elements of $(\textbf{I} + |{\bf A}|)^{k-1}$ are positive; for further details, see [Reference Horn and Johnson11, Section 6.2].

\begin{alignat*}{2} \textbf{M}_1 & = \begin{matrix} 300 \cr 210 \cr 201 \cr 120 \cr 111 \cr 102 \cr 030 \cr 021 \cr 012 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 6\;\;\;\;\; & 0 \cr 4\;\;\;\;\; & 5\;\;\;\;\; & 0 \cr 3\;\;\;\;\; & 5\;\;\;\;\; & 1 \cr 5\;\;\;\;\; & 4\;\;\;\;\; & 0 \cr 4\;\;\;\;\; & 4\;\;\;\;\; & 1 \cr 3\;\;\;\;\; & 4\;\;\;\;\; & 2 \cr 6\;\;\;\;\; & 3\;\;\;\;\; & 0 \cr 5\;\;\;\;\; & 3\;\;\;\;\; & 1 \cr 4\;\;\;\;\; & 3\;\;\;\;\; & 2\cr 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \end{pmatrix}, \qquad & \textbf{M}_2 & = \begin{matrix} 300 \cr 210 \cr 201 \cr 120 \cr 111 \cr 102 \cr 030 \cr 021 \cr 012 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 6\;\;\;\;\; & 0 \cr 2\;\;\;\;\; & 5\;\;\;\;\; & 2 \cr 4\;\;\;\;\; & 4\;\;\;\;\; & 1 \cr 1\;\;\;\;\; & 4\;\;\;\;\; & 4 \cr 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \cr 5\;\;\;\;\; & 2\;\;\;\;\; & 2 \cr 0\;\;\;\;\; & 3\;\;\;\;\; & 6 \cr 2\;\;\;\;\; & 2\;\;\;\;\; & 5 \cr 4\;\;\;\;\; & 1\;\;\;\;\; & 4\cr 6\;\;\;\;\; & 0\;\;\;\;\; & 3 \end{pmatrix}, \\[5pt] \textbf{A}_1 & = \begin{matrix} 300 \cr 030 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 6\;\;\;\;\; & 0 \cr 6\;\;\;\;\; & 3\;\;\;\;\; & 0 \cr 3\;\;\;\;\; & 3\;\;\;\;\; & 3 \end{pmatrix}, & \textbf{A}_2 & = \begin{matrix} 300 \cr 030 \cr 003 \end{matrix} \begin{pmatrix} 3\;\;\;\;\; & 6\;\;\;\;\; & 0 \cr 0\;\;\;\;\; & 3\;\;\;\;\; & 6 \cr 6\;\;\;\;\; & 0\;\;\;\;\; & 3 \end{pmatrix}. \end{alignat*}

Here, the urn scheme represented by $\textbf{M}_1$ is reducible. If we start monochromatically in either color 1 or color 2, there is no path of samples from only balls of colors 1 and 2 which will produce balls of color 3. Furthermore, we see that its core matrix $\textbf{A}_1$ is reducible in the matricial sense. The urn scheme represented by $\textbf{M}_2$ , however, is irreducible. No matter which initiating color is chosen, there exist sampling paths which eventually produce every possible color. Such conclusions can also be made from the results from the core matrix $\textbf{A}_2$ .

When an urn is irreducible, the eigenvalue of $\textbf{A}$ with the largest real part, $\lambda_1$ , is real, positive, and simple (of multiplicity 1), a consequence of a classic Perron–Frobenius theorem [Reference Horn and Johnson11, Section 8.4].

Remark 4. When $\textbf{A}$ is nonnegative (balls are not removed for any sample), the consequence is direct. If $\textbf{A}$ has negative entries, they must be along the diagonal, and so we decompose $\textbf{A}$ into $\textbf{A}=\textbf{R}-s\textbf{I}$ , where $\textbf{R}$ is nonnegative and irreducible, and therefore has leading eigenvalue $(\lambda_1+s)$ that is simple. The desired results for $\textbf{A}$ thus follow.

As in what follows, the results for an affine irreducible (k, s, b)-urn scheme are governed by the ratio of $\textrm{Re}(\lambda_2)$ to $\lambda_1$ . We call this ratio the core index and denote it by $\Lambda$ , which is always strictly less than 1 in irreducible core matrix. A trichotomy arises:

• Small-index core, where $\Lambda < \frac 1 2$ .
• Critical-index core, where $\Lambda = \frac 1 2$ .
• Large-index core, where $\Lambda > \frac 1 2$ .

Occasionally, we may just call the schemes small, critical, or large.

5. A multivariate martingale underlying the (k, s, b)-urn class

The conditional expectation in (5) leads to a martingale formulation. We use the operator $\nabla$ to denote backward differences. At draw n, we have

\begin{equation*} \mathbb{E}[\nabla\textbf{X}_n \mid \mathbb{F}_{n-1}] = \mathbb{E}\bigg[\frac1s\textbf{Q}_{n}\textbf{A}\mid\mathbb{F}_{n-1}\bigg] = \frac{1}{\tau_{n-1}}\textbf{X}_{n-1}\textbf{A}.\end{equation*}

Define $\textbf{Y}_n := \nabla\textbf{X}_n - ({1}/{\tau_{n-1}})\textbf{X}_{n-1}\textbf{A}.$ Then, $\textbf{Y}_n$ is $\mathbb{F}_{n-1}$ -measurable and

(6)

\begin{equation} \mathbb{E}[\textbf{Y}_n \mid \mathbb{F}_{n-1}] = \textbf{0}.\end{equation}

Thus, we have the recurrence

(7)

\begin{equation} \textbf{X}_n = \bigg(\textbf{I} + \frac{1}{\tau_{n-1}}\textbf{A}\bigg)\textbf{X}_{n-1} + \textbf{Y}_n.\end{equation}

In the proofs, we make use of the matrix products $\textbf{F}_{i,j}:=\prod_{i\le\ell \lt j}(\textbf{I} + ({1}/{\tau_{\ell}})\textbf{A})$ , $0\le i \le j$ , to unravel the martingale differences in the process $\textbf{X}_n$ . The product $\textbf{F}_{i,j}$ can also be written in terms of the polynomial function $f_{i,j}$ , where

(8)

\begin{equation} f_{i,j}(z) := \prod_{i\le\ell \le j}\bigg(1 + \frac{1}{\tau_{\ell}}z\bigg) = \frac{\Gamma(j+({1}/{b})(\tau_0+z))}{\Gamma(j+({\tau_0}/{b}))} \times \frac{\Gamma(i+({\tau_0}/{b}))}{\Gamma(i+({1}/{b})(\tau_0+z))}.\end{equation}

We order the eigenvalues of $\textbf{A}$ in the fashion specified in Section 3.1 and note that $\lambda_1={b}$ is a simple eigenvalue and $\textrm{Re}(\lambda_i) < \lambda_1$ for $i > 1$ , since $\textbf{A}$ is irreducible. Accordingly, there exist corresponding left and right eigenvectors of $\textbf{A}$ , to be called $\textbf{v}_1$ and $\textbf{w}_1$ , unique up to normalization. Since $\textbf{A}$ is row-balanced, all the components of $\textbf{w}_1$ are equal. We may set $\textbf{w}_1=\textbf{1}$ and normalize $\textbf{v}_1$ such that $\textbf{w}_1\textbf{v}_1^\top=1$ . This projection results in $\textbf{P}_{\lambda_1}=\textbf{w}_1^\top\textbf{v}_1$ , and so we have

(9)

\begin{equation} \textbf{v}\textbf{P}_{\lambda_1} = \textbf{v}\textbf{1}^\top\textbf{v}_1 \quad \text{for any vector } \textbf{v} \in \mathbb{C}^k.\end{equation}

The following lemmas provide important properties that can be applied to the polynomial function (8). We omit the proofs as they are nearly identical to those in [Reference Janson14].

Lemma 2. For $1\le i\le j$ and $\lambda \in \sigma(\textbf{A})$ ,

\begin{equation*} \textbf{P}_\lambda\textbf{F}_{i,j} = \textbf{O}\bigg(\bigg(\frac{j}{i}\bigg)^{\textrm{Re}(\lambda)/b} \bigg(1+\ln\bigg(\frac{j}{i}\bigg)\bigg)^{\nu_\lambda}\bigg). \end{equation*}

Furthermore, for any $\nu \ge \nu_\lambda$ , as $i, j \to \infty$ with $i\le j$ , we have

\begin{align*} \textbf{P}_\lambda\textbf{F}_{i,j} & = \frac{1}{\nu!}\bigg(\frac{j}{i}\bigg)^{\lambda/b}\bigg(\frac{\ln({j}/{i})}{b}\bigg)^{\nu} \textbf{P}_\lambda\textbf{N}^\nu_\lambda + \textbf{o}\bigg(\bigg(\frac{j}{i}\bigg)^{\textrm{Re}(\lambda)/b}\ln^{\nu}\bigg(\frac{j}{i}\bigg)\bigg) \\[5pt] & \quad + \textbf{O}\bigg(\bigg(\frac{j}{i}\bigg)^{\textrm{Re}(\lambda)/b} \bigg(1+\ln^{\nu-1}\bigg(\frac{j}{i}\bigg)\bigg)\bigg). \end{align*}

Lemma 3. When $\lambda_1=b$ is a simple eigenvalue then, for $0\le i \le j$ ,

(10)

\begin{equation} \textbf{P}_{\lambda_1}\textbf{F}_{i,j} = f_{i,j}(\lambda_1)\textbf{P}_{\lambda_1} = \frac{j+{\tau_0}/{b}}{i+{\tau_0}/{b}}\textbf{P}_{\lambda_1}. \end{equation}

Lemma 4. For any fixed $x \in (0,1]$ , as $n\to \infty$ , $\textbf{F}_{\lceil xn\rceil,n} \to x^{-({1}/{b})\textbf{A}}$ .

5.1. Asymptotic mean

While exact, the execution of the calculation in Corollary 1 may be quite cumbersome. Here, we present a shortcut to the asymptotics of this computation via a multivariate martingale.

Theorem 2. Suppose an affine (k, s,b)-urn scheme is built from an irreducible core $\textbf{A}$ with principal eigenvector $\textbf{v}_1$ and core index $\Lambda\le1$ . Then the expected value of the composition vector is $\mathbb{E}[\textbf{X}_n] = \lambda_1 n \textbf{v}_1 + \tau_0\textbf{v}_1 + \textbf{O}\big(n^\Lambda\ln^{\nu_2}(n)\big)$ .

Proof. The linear vectorial recurrence (7) is of the general form $\textbf{L}_n = \textbf{G}_{\textbf{n}}\textbf{L}_{n-1} + \textbf{H}_n$ , with solution (interpreting the product corresponding to $i=n$ as the identity matrix $\textbf{I}$ )

$${\bf L}_n = \sum_{i=1}^n\textbf{H}_i\prod_{j=i+1}^n\textbf{G}_j + \textbf{L}_0\prod_{j=1}^n\textbf{G}_j.$$

Write the recurrence in the form $\textbf{X}_n = \textbf{F}_{n-1,n} + \textbf{Y}_n$ , to recognize the solution

(11)

\begin{equation} \textbf{X}_n = \textbf{X}_{0}\textbf{F}_{0,n} + \sum_{i=1}^{n}\textbf{Y}_i\textbf{F}_{i,n}. \end{equation}

The factors $\textbf{X}_0$ and $\textbf{F}_{i,j}$ are nonrandom. We have $\mathbb{E}[\textbf{Y}_i]=0$ , and by taking expectations we get

(12)

\begin{equation} \boldsymbol{\mu}_n = \mathbb{E}[\textbf{X}_n] = \textbf{X}_0\textbf{F}_{0,n} = \textbf{X}_0\textbf{I}\textbf{F}_{0,n} = \sum_{\lambda\in\sigma(\textbf{A})}\textbf{X}_0\textbf{P}_\lambda\textbf{F}_{0,n}. \end{equation}

Janson [Reference Janson14] analyzes the function $\textbf{F}_{0,n}$ carefully. When $\lambda \ne \lambda_1$ , [Reference Janson14, Lemma 4.3] implies that

\begin{equation*} \textbf{X}_0\textbf{P}_\lambda\textbf{F}_{0,n} = \textbf{O}\big(n^{\textrm{Re}(\lambda)/b}\ln^{\nu_\lambda}(n)\big) = \textbf{O}\big(n^{\textrm{Re}(\lambda_2)/b}\ln^{\nu_2}(n)\big). \end{equation*}

By (9), we get $\textbf{X}_0\textbf{P}_{\lambda_1} = \textbf{X}_0\textbf{1}^\top\textbf{v}_1 = \tau_0\textbf{v}_1$ , and so, by (10), we have

\begin{equation*} \textbf{X}_0\textbf{P}_{\lambda_1} \textbf{F}_{0,n} = \frac{n+{\tau_0}/{b}}{{\tau_0}/{b}}\textbf{X}_0\textbf{P}_{\lambda_1} = \frac{bn+\tau_0}{\tau_0}(\tau_0\textbf{v}_1) = (bn+\tau_0)\textbf{v}_1. \end{equation*}

Therefore, we have $\mathbb{E}[\textbf{X}_n] = (\lambda_1n+\tau_0)\textbf{v}_1+\textbf{O}\big(n^{\Lambda}\ln^{\nu_2}(n)\big)$ .

5.2. The covariance matrix

For the covariance matrix, we define $\hat{\textbf{P}} = \sum_{\lambda\ne\lambda_1}\textbf{P}_{\lambda} = \textbf{I} - \textbf{P}_{\lambda_1}$ and the symmetric matrix

(13)

\begin{equation} \textbf{B} := \frac1{s}\textbf{A}^\top\textbf{diag}(\textbf{v}_1)\textbf{A}.\end{equation}

Lastly, we use ${\textrm{e}}^{{s}\textbf{A}} := \sum_{k=0}^{\infty}({1}/{k!})({s}\textbf{A})^k$ for all $s\in\mathbb{R}$ to define the integral

(14)

\begin{equation} \boldsymbol{\Sigma}_{(\textbf{A})} := b\int_0^\infty{\textrm{e}}^{{s}\textbf{A}^\top}\hat{\textbf{P}}^\top\textbf{B} \hat{\textbf{P}}{\textrm{e}}^{{s}\textbf{A}}{\textrm{e}}^{-{b}s}\,{\textrm{d}} s.\end{equation}

Theorem 3. Consider a growing irreducible affine (k,s,b)-urn. For both unordered samples with and without replacement, and for large n:

(i) If the core index is small, then $({1}/{n})\boldsymbol{\Sigma}_n \to \boldsymbol{\Sigma}_{(\textbf{A})}$ , where $\boldsymbol{\Sigma}_{(\textbf{A})}$ is a limiting matrix defined in (14).
(ii) If the core index is critical, then
\begin{equation*} \frac1{n\ln^{2\nu_2+1}(n)}\boldsymbol{\Sigma}_n \to \frac1{{b}^{2\nu_2}(2\nu_2+1)(\nu_2!)^2} \sum_{\textrm{Re}(\lambda)={b}/2}(\textbf{N}^*_\lambda)^{\nu_2}\textbf{P}^*_\lambda\textbf{B}\textbf{P}_\lambda \textbf{N}^{\nu_2}_\lambda. \end{equation*}

Before we prove the theorems for the covariance, we provide some additional scaffolding. Using (12), we rewrite (11) as

(15)

\begin{equation} \textbf{X}_n - \boldsymbol{\mu}_n = \sum_{i=1}^{n}\textbf{Y}_i\textbf{F}_{i,n}.\end{equation}

Thus, the covariance can be computed as

(16)

\begin{align} \mathbb{C}\textrm{ov}[\textbf{X}_n] := \mathbb{E}[(\textbf{X}_n-\boldsymbol{\mu}_n)^\top(\textbf{X}_n-\boldsymbol{\mu}_n)] & = \mathbb{E}\Bigg[\sum_{i=1}^n\sum_{j=1}^n(\textbf{Y}_i\textbf{F}_{i,n})^\top(\textbf{Y}_j\textbf{F}_{j,n})\Bigg] \nonumber \\[5pt] & = \sum_{i=1}^n\sum_{j=1}^n\textbf{F}_{i,n}^\top\mathbb{E}[\textbf{Y}_i^\top\textbf{Y}_j]\textbf{F}_{j,n}. \end{align}

Since $\textbf{Y}_j$ is $\mathbb{F}_{j-1}$ -measurable and $\mathbb{E}[\textbf{Y}_i\mid\mathbb{F}_j]=\textbf{0}$ when $i>j$ , $\mathbb{E}[\textbf{Y}_i^\top\textbf{Y}_j] = \mathbb{E}[\mathbb{E}[\textbf{Y}_i^\top\mid\mathbb{F}_j]\textbf{Y}_j] = \textbf{0}$ . In similar fashion, we have $\mathbb{E}[\textbf{Y}_i^\top\textbf{Y}_j] = \textbf{0}$ when $i<j$ , and so (16) reduces to

(17)

\begin{equation} \mathbb{C}\textrm{ov}[\textbf{X}_n] = \sum_{i=1}^n\textbf{F}_{i,n}^\top\mathbb{E}[\textbf{Y}_i^\top\textbf{Y}_i]\textbf{F}_{i,n}.\end{equation}

Since the urn is balanced, $\nabla\textbf{X}_n\textbf{1}^\top={b}$ for all $n\ge1$ . Thus, we have

\begin{equation*} \textbf{Y}_n\textbf{1}^\top = \nabla\textbf{X}_n\textbf{1}^\top - \mathbb{E}[\nabla\textbf{X}_n\textbf{1}^\top\mid\mathbb{F}_{n-1}] = {b} - {b} = 0,\end{equation*}

which implies that $\textbf{Y}_n\textbf{P}_{\lambda_1}=\textbf{0}$ . Thus, we can rewrite (17) as

(18)

\begin{equation} \mathbb{C}\textrm{ov}[\textbf{X}_n] = \sum_{\lambda\in\sigma(\textbf{A})}\sum_{\varrho\in\sigma(\textbf{A})} \sum_{i=1}^{n}\textbf{F}_{i,n}^\top\textbf{P}_{\lambda}^\top\mathbb{E}[\textbf{Y}_{i}^\top\textbf{Y}_i] \textbf{P}_\varrho\textbf{F}_{i,n}.\end{equation}

We define $\textbf{T}_{i,n,\lambda,\varrho}:=\textbf{F}_{i,n}^\top\textbf{P}_{\lambda}^\top\mathbb{E}[\textbf{Y}_{i}^\top\textbf{Y}_i]\textbf{P}_\varrho\textbf{F}_{i,n}$ for convenience. When either $\lambda$ or $\varrho$ is equal to $\lambda_1$ , $\textbf{P}_{\lambda}^\top\mathbb{E}[\textbf{Y}_{i}^\top\textbf{Y}_i]\textbf{P}_\varrho = \textbf{0}$ , and so we may drop those cases and rewrite (18) as

(19)

\begin{equation} \mathbb{C}\textrm{ov}[\textbf{X}_n] = \sum_{\lambda\ne\lambda_1}\sum_{\varrho\ne\lambda_1}\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho}.\end{equation}

From this result we produce three lemmas, the first of which gives us the general asymptotic growth of the covariances, the second demonstrating convergence in $\mathcal{L}_2$ and asymptotic approximation in $\mathcal{L}_1$ , and the last providing us with the development of the $\textbf{B}$ matrix described in (13).

Lemma 5. If $\lambda_1$ is a simple eigenvalue then, for $n\ge 2$ ,

(20)

\begin{equation} \mathbb{C}\textrm{ov}[\textbf{X}_n] = \begin{cases} \textbf{O}(n) & \text{for } \Lambda < \frac 1 2, \\[5pt] \textbf{O}(n\ln^{2\nu_2+1}(n)) & \text{for } \Lambda = \frac 1 2; \\[5pt] \textbf{O}(n^{2\Lambda}\ln^{2\nu_2}(n)) & \text{for } \Lambda > \frac 1 2. \end{cases} \end{equation}

Consequently, we have $\mathbb{C}\textrm{ov}[\textbf{X}_n]=\textbf{o}(n^2)$ whenever $\lambda_2 < \lambda_1$ .

Proof. By construction, $\textbf{Y}_n=(({1}/{s})\textbf{Q}_n - ({1}/{\tau_{n-1}})\textbf{X}_{n-1})\textbf{A}$ , and so

\begin{equation*} \mathbb{E}[\textbf{Y}_n^\top\textbf{Y}_n] = \frac{1}{s^2}\textbf{A}^\top\mathbb{E}[\textbf{Q}^\top_n\textbf{Q}_n]\textbf{A} = \textbf{{O}}(1). \end{equation*}

We provide detailed results for this in Lemma 7. This result, along with Lemma 2, implies that if $\lambda$ and $ \varrho$ are two eigenvalues of $\textbf{A}$ then, for $1\le i \le n$ ,

\begin{equation*} \textbf{T}_{i,n,\lambda,\varrho} = (\textbf{P}_{\lambda}\textbf{F}_{i,n})^\top \mathbb{E}[\textbf{Y}_{i}^\top\textbf{Y}_i]\textbf{P}_\varrho\textbf{F}_{i,n} = \textbf{O}\bigg(\bigg(\frac{n}{i}\bigg)^{(\textrm{Re}(\lambda)+\textrm{Re}(\varrho))/{b}} \bigg(1+\ln\bigg(\frac{n}{i}\bigg)\bigg)^{\nu_\lambda+\nu_\varrho}\bigg). \end{equation*}

If $\textrm{Re}(\lambda)+\textrm{Re}(\varrho) \ge {b}$ , we have

(21)

\begin{equation} \textbf{T}_{i,n,\lambda,\varrho} = \textbf{O}\bigg(\bigg(\frac{n}{i}\bigg)^{(\textrm{Re}(\lambda)+\textrm{Re}(\varrho))/{b}} \ln^{\nu_\lambda+\nu_\varrho}(n)\bigg). \end{equation}

If $\textrm{Re}(\lambda)+\textrm{Re}(\varrho) < {b}$ , choose $\alpha$ such that $({1}/{b})(\textrm{Re}(\lambda)+\textrm{Re}(\varrho)) < \alpha < 1$ . Then, we are guaranteed that

(22)

\begin{equation} \textbf{T}_{i,n,\lambda,\varrho} = \textbf{O}\bigg(\bigg(\frac{n}{i}\bigg)^{\alpha}\bigg). \end{equation}

Summing together over i, we get

(23)

\begin{equation} \sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} = \begin{cases} \textbf{O}(n), & \textrm{Re}(\lambda) + \textrm{Re}(\varrho) < {b}, \\[5pt] \textbf{O}(n\ln^{\nu_\lambda+\nu_\varrho+1}(n)), & \textrm{Re}(\lambda) + \textrm{Re}(\varrho) = {b}, \\[5pt] \textbf{O}(n^{(\textrm{Re}(\lambda)+\textrm{Re}(\varrho))/{b}}\ln^{\nu_\lambda+\nu_\varrho}(n)), & \textrm{Re}(\lambda) + \textrm{Re}(\varrho) > {b}. \end{cases} \end{equation}

Summing over all combinations of $\lambda,\varrho\in\sigma(\textbf{A})\backslash\{\lambda_1\}$ , $\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho}$ is of highest order when $\lambda=\varrho=\lambda_2$ , and so $\mathbb{C}\textrm{ov}[\textbf{X}_n]=\textbf{o}(n^2)$ when $\lambda_2 < \lambda_1$ .

Lemma 6. As $n\to\infty$ , $({1}/{n})\textbf{X}_n\ \buildrel {\mathcal{L}_2} \over \longrightarrow\ b\textbf{v}_1$ . Furthermore, we have the asymptotic approximation

\begin{equation*} \textbf{X}_n - bn\textbf{v}_1 = \begin{cases} \textbf{O}_{\mathcal{L}_1}\big(\sqrt{n}\big) & \text{for } \Lambda < \frac{1}{2}, \\[5pt] \textbf{O}_{\mathcal{L}_1}\big(n^{{1}/{2}}\ln^{\nu_2+{1}/{2}}(n)\big) & \text{for } \Lambda = \frac{1}{2}, \\[5pt] \textbf{O}_{\mathcal{L}_1}\big(n^{\Lambda}\ln^{\nu_2}(n)\big) & \text{for } \Lambda > \frac{1}{2}. \end{cases} \end{equation*}

Proof. From Lemma 5, we have

\begin{equation*} \bigg\|\frac{1}{n}\textbf{X}_n - \frac{1}{n}\boldsymbol{\mu}_n\bigg\|_{\mathcal{L}_2}^2 = \frac{1}{n^{2}}\|\textbf{X}_n - \boldsymbol{\mu}_n\|_{\mathcal{L}_2}^2 = \sum_{i=1}^k\frac{1}{n^2}\mathbb{V}\textrm{ar}[X_{n,i}] \to 0, \end{equation*}

and $({1}/{n})\boldsymbol{\mu}_n \to b\textbf{v}_1$ by Theorem 2. Thus, $\|({1}/{n})\textbf{X}_n - b\textbf{v}_1\|_{\mathcal{L}_2}^2 \to 0$ . Now, given Lemma 5 and Theorem 2, for each $i \in [k]$ we obtain

\begin{align*} \mathbb{E}[(X_{n,i} - bnv_{1,i})^2] = \mathbb{E}[((X_{n,i}-\mu_{n,i}) + (\mu_{n,i}-bnv_{1,i}))^2] & = \mathbb{V}\textrm{ar}[X_{n,i}] + (\mu_{n,i} - bnv_{1,i})^2 \\[5pt] & = \begin{cases} O(n) & \text{for } \Lambda < \frac{1}{2}, \\[5pt] O\big(n\ln^{2\nu_2+1}(n)\big) & \text{for } \Lambda = \frac{1}{2}, \\[5pt] O\big(n^{2\Lambda}\ln^{2\nu_2}(n)\big) & \text{for } \Lambda > \frac{1}{2}. \end{cases} \end{align*}

So, by Jensen’s inequality, $\mathbb{E}[|X_{n,i} - bnv_{1,i}|] \le \sqrt{\mathbb{E}[(X_{n,i}-bv_{1,i}n)^2]}$ . It follows that $(\textbf{X}_{n} - bn\textbf{v}_1)$ has the asymptotic approximation claimed.

Lemma 7. If $\lambda_1$ is a simple eigenvalue then, as $n\to\infty$ , $\mathbb{E}\big[\textbf{Y}_n^\top\textbf{Y}_n\big] \to \textbf{B} - ({b^2}/{s})\textbf{v}_1^\top\textbf{v}_1$ . Hence, for any eigenvalue $\lambda \ne \lambda_1$ , $\mathbb{E}\big[\textbf{Y}_n^\top\textbf{Y}_n\big]\textbf{P}_\lambda \to \textbf{B}\textbf{P}_\lambda$ .

Proof. Since $\textbf{Y}_n=\nabla\textbf{X}_n - ({1}/{\tau_{n-1}})\textbf{X}_{n-1}\textbf{A}$ and $\textbf{X}_{n}=\textbf{X}_{n-1}+({1}/{s})\textbf{Q}_n\textbf{A}$ ,

\begin{align*} \mathbb{E}\big[\textbf{Y}^\top_n\textbf{Y}_n \mid \mathbb{F}_{n-1}\big] & = \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n) \mid \mathbb{F}_{n-1}] \\[5pt] & \quad - \mathbb{E}\big[(\nabla\textbf{X}_n)^\top\big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big) \mid \mathbb{F}_{n-1}\big] \\[5pt] & \quad - \mathbb{E}\big[\big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big)^\top(\nabla\textbf{X}_n) \mid \mathbb{F}_{n-1}\big] \\[5pt] & \quad + \mathbb{E}\big[\big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big)^\top \big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big) \mid \mathbb{F}_{n-1}\big] \\[5pt] & = \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n) \mid \mathbb{F}_{n-1}] \\[5pt] & \quad - \mathbb{E}[(s^{-1}\textbf{Q}_{n}\textbf{A})^\top \mid \mathbb{F}_{n-1}] \big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big) \\[5pt] & \quad - \big(\tau_{n-1}^{-1}\textbf{X}_{n-1}\textbf{A}\big)^\top \mathbb{E}[(s^{-1}\textbf{Q}_{n}\textbf{A}) \mid \mathbb{F}_{n-1}\big] \\[5pt] & \quad + \big(\tau_{n-1}^{-2}\textbf{X}_{n-1}\textbf{A}\big)^\top(\textbf{X}_{n-1}\textbf{A}) \\[5pt] & = \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n) \mid \mathbb{F}_{n-1}] - \tau_{n-1}^{-2}\textbf{A}^\top\textbf{X}_{n-1}^\top\textbf{X}_{n-1}\textbf{A}. \end{align*}

Thus, $\mathbb{E}\big[\textbf{Y}^\top_n\textbf{Y}_n \mid \mathbb{F}_{n-1}\big] = \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n)] - \tau_{n-1}^{-2}\textbf{A}^\top\mathbb{E}\big[\textbf{X}_{n-1}^\top\textbf{X}_{n-1}\big]\textbf{A}$ .

We tackle the asymptotics of this expression in two parts. First, note that

(24)

\begin{equation} \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n) \mid \mathbb{F}_{n-1}] = \frac1{s^2}\textbf{A}^\top\mathbb{E}\big[\textbf{Q}_{n}^\top\textbf{Q}_{n} \mid \mathbb{F}_{n-1}\big]\textbf{A}. \end{equation}

Also, $\mathbb{E}\big[\textbf{Q}_{n}^\top\textbf{Q}_{n} \mid \mathbb{F}_{n-1}\big] = {\mathcal{Q}}_n$ , with the entries

(25)

\begin{equation} {\mathcal{Q}}_n[i,j] = \begin{cases} \dfrac{s(s-1)X_{n-1,i}^2}{\tau_{n-1}(\tau_{n-1}-1)}+\dfrac{s(\tau_{n-1}-s)X_{n-1,i}}{\tau_{n-1}(\tau_{n-1}-1)}, & i=j, \\[12pt] \dfrac{s(s-1)X_{n-1,i}X_{n-1,j}}{\tau_{n-1}(\tau_{n-1}-1)}, & i \ne j, \end{cases} \end{equation}

which follow from the known second moments of the multivariate hypergeometric distribution [Reference Kendall, Stuart and Ord16].

Let $n_{\Lambda}$ be a function that represents the appropriate asymptotic order, given $\Lambda$ and Lemma 6. Rewrite $X_{n-1,i}= b(n-1)v_{1,i}+\textbf{O}_{{\mathcal{L}}_1}(n_{\Lambda})$ . Then, when $i=j$ ,

\begin{align*} \mathbb{E}[{\mathcal{Q}}_n[i,i]] & = \mathbb{E}\bigg[\frac{s(s-1)(b(n-1)v_{1,i}+O_{{\mathcal{L}}_1}(n_{\Lambda}))^2}{\tau_{n-1}^2} + \frac{s(\tau_{n-1}-s)(b(n-1)v_{1,i}+O_{{\mathcal{L}}_1}(n_{\Lambda}))}{\tau_{n-1}^2}\bigg] \\[5pt] & = s(s-1)v_{1,i}^2 + sv_{1,i} + \mathbb{E}\bigg[s(s-1)\frac{2b(n-1)v_{1,i}O_{{\mathcal{L}}_1}(n_{\Lambda})+O_{{\mathcal{L}}_1}(n_{\Lambda}^2)} {(b(n-1)+\tau_0)^2}\bigg] \\[5pt] & \quad + \mathbb{E}\bigg[\frac{O_{{\mathcal{L}}_1}(n_{\Lambda})}{b(n-1)+\tau_0}\bigg] \to s(s-1)v_{1,i}^2 + sv_{1,i} \quad \text{as } n\to\infty. \end{align*}

For $i\ne j$ , we get $\mathbb{E}[{\mathcal{Q}}_n[i,j]] \to s(s-1)v_{1,i}v_{1,j}$ as $n\to\infty$ . Putting the two results together, we have $\mathbb{E}[{\mathcal{Q}}_{n}] \to s(s-1)\textbf{v}_1^\top\textbf{v}_1 + s\textbf{diag}(\textbf{v}_1)$ . We take the expectation of (24) to get

\begin{align*} \mathbb{E}[(\nabla\textbf{X}_n)^\top(\nabla\textbf{X}_n)] = \frac1{s^2}\textbf{A}^\top\mathbb{E}[{\mathcal{Q}}_{n}]\textbf{A} \to \frac{1}{s}\textbf{A}^\top((s-1)\textbf{v}_1^\top\textbf{v}_1+\textbf{diag}(\textbf{v}_1))\textbf{A}. \end{align*}

Since $\mathbb{C}\textrm{ov}[\textbf{X}_n]=\textbf{o}(n^2)$ , we have

\begin{align*} (\tau_0 + bn)^{-2}\mathbb{E}\big[\textbf{X}_{n-1}^\top\textbf{X}_{n-1}\big] & = (\tau_0 + bn)^{-2}\mathbb{V}\textrm{ar}[\textbf{X}_{n-1}] + (\tau_0 + bn)^{-2}(\mathbb{E}[\textbf{X}_{n-1}])^\top\mathbb{E}[\textbf{X}_{n-1}] \\[5pt] & \to \textbf{0} + \textbf{v}_1^\top\textbf{v}_1. \end{align*}

Define the matrix $\textbf{B}$ as provided by (13). Since $\textbf{v}_1\textbf{A}=\lambda_1\textbf{v}_1=b\textbf{v}_1$ , our convergence simplifies to

\begin{equation*} \mathbb{E}\big[\textbf{Y}_n^\top\textbf{Y}_n\big] \to \frac{1}{s}\textbf{A}^\top((s-1)\textbf{v}_1^\top\textbf{v}_1+\textbf{diag}(\textbf{v}_1))\textbf{A} - \textbf{A}\textbf{v}_1^\top\textbf{v}_1\textbf{A} = \textbf{B}-\frac{b^2}{s}\textbf{v}_1^\top\textbf{v}_1. \end{equation*}

We complete the proof by noting that when $\lambda \ne \lambda_1$ , $\textbf{v}_1\textbf{P}_\lambda=\textbf{v}_1\textbf{P}_{\lambda_1}\textbf{P}_\lambda = \textbf{0}$ .

With these results, we now prove Theorem 3. We use techniques quite similar to those found in [Reference Janson14], applying them to the multiple drawing scheme.

Proof of Theorem 3(i). Let $\lambda, \varrho \in \sigma(\textbf{A})\backslash \{\lambda_1\}$ . Then, $\textrm{Re}(\lambda)$ and $\textrm{Re}(\varrho)$ are at most $\textrm{Re}(\lambda_2) < b/2$ . We convert the inner sum into an integral and get

(26)

\begin{equation} \frac{1}{n}\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} = \int_0^1\textbf{T}_{\lceil xn\rceil,n,\lambda,\varrho}\,{\textrm{d}} x. \end{equation}

For each fixed $x\in(0,1]$ , Lemmas 4 and 7 imply that

\begin{equation*} \textbf{T}_{\lceil xn\rceil,n,\lambda,\varrho} = \textbf{F}_{\lceil xn\rceil,n}^\top\textbf{P}_{\lambda}^\top \mathbb{E}\big[\textbf{Y}_{\lceil xn\rceil}^\top\textbf{Y}_{\lceil xn\rceil}\big]\textbf{P}_\varrho \textbf{F}_{\lceil xn\rceil,n} \to x^{(-{1}/b)\textbf{A}^\top}\textbf{P}_{\lambda}^\top\textbf{B}\textbf{P}_{\varrho}x^{(-{1}/b)\textbf{A}}. \end{equation*}

Now, choose some $\alpha \in [0,1)$ , such that ${\textrm{Re}(\lambda_2)}/b<{\alpha}/{2}$ . Then (22) can be applied provided that, for some $C<\infty$ ,

\begin{equation*} \textbf{T}_{\lceil xn\rceil,n,\lambda,\varrho} \le C\bigg(\frac{n}{\lceil xn\rceil}\bigg)^\alpha \le Cx^{-\alpha}, \end{equation*}

which is integrable over (0,1]. Applying Lebesgue dominated convergence to (26) and a change of variables to $x={\textrm{e}}^{-bs}$ , we get

\begin{equation*} \frac{1}{n}\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} \to \int_0^1 x^{(-{1}/{b})\textbf{A}^\top}\textbf{P}_{\lambda}^\top\textbf{B}\textbf{P}_{\varrho}x^{(-{1}/{b})\textbf{A}}\,{\textrm{d}} x = b\int_0^\infty{\textrm{e}}^{s\textbf{A}^\top}\textbf{P}_{\lambda}^\top\textbf{B}\textbf{P}_{\varrho}{\textrm{e}}^{s\textbf{A}}{\textrm{e}}^{-bs}\,{\textrm{d}} s. \end{equation*}

Thus, (19) and the definition (14) give us the result

\begin{equation*} \frac{1}{n}\mathbb{C}\textrm{ov}[\textbf{X}_n] = \frac{1}{n}\sum_{\lambda\ne\lambda_1}\sum_{\varrho\ne\lambda_1}\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} \to b\int_0^\infty{\textrm{e}}^{s\textbf{A}^\top}\hat{\textbf{P}}^\top\textbf{B}\hat{\textbf{P}}{\textrm{e}}^{s\textbf{A}}{\textrm{e}}^{-bs}\,{\textrm{d}} s =: \boldsymbol{\Sigma}_{(\textbf{A})}, \end{equation*}

as desired.

Proof of Theorem 3(ii). We again use (19) and consider $\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho}$ for eigenvalues $\lambda,\varrho\in\sigma(\textbf{A})\backslash\{\lambda_1\}$ . By assumption, $\textrm{Re}(\lambda)+\textrm{Re}(\varrho) \le 2\textrm{Re}(\lambda_2) = b$ . If $\textrm{Re}(\lambda)+\textrm{Re}(\varrho) < b$ , then we have $\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho}=\textbf{O}(n)$ , based on (23). Thus, we only need to consider cases where $\textrm{Re}(\lambda)=\textrm{Re}(\varrho)=\textrm{Re}(\lambda_2)=b/2$ . Furthermore, we have $\nu_\lambda,\nu_\varrho \le \nu_2$ .

We again transform the sum into an integral, but first separate the term corresponding to $i=1$ from the sum and then use the change of variables $x=n^y = {\textrm{e}}^{y \ln(n)}$ . From these steps, we get

\begin{equation*} \sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} = \textbf{T}_{1,n,\lambda,\varrho} + \int_1^n\textbf{T}_{\lceil x\rceil,n,\lambda,\varrho}\,{\textrm{d}} x = \textbf{T}_{1,n,\lambda,\varrho} + \int_0^1\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho}n^y\ln(n)\,{\textrm{d}} y. \end{equation*}

From (21), we know that $\textbf{T}_{1,n,\lambda,\varrho}=\textbf{O}\big(n\ln^{2\nu_2}(n)\big)$ , and so

\begin{equation*} (n\ln^{2\nu_2+1}(n))^{-1}\sum_{i=1}^{n}\textbf{T}_{i,n,\lambda,\varrho} = \textbf{o}(1) + \int_0^1\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho}n^{y-1}(\ln(n))^{-2\nu_2}\,{\textrm{d}} y. \end{equation*}

We fix $y\in (0,1)$ . Then, by Lemma 2,

\begin{align*} \textbf{P}_\lambda\textbf{F}_{\lceil n^y\rceil,n} & = \frac{1}{\nu_2!}\bigg(\frac{n}{\lceil n^y\rceil}\bigg)^{\lambda/b} \bigg(\frac{1}{b}\ln\bigg(\frac{n}{\lceil n^y\rceil}\bigg)\bigg)^{\nu_2} \big(\textbf{P}_\lambda\textbf{N}^{\nu_2}_\lambda + \textbf{o}(1)\big) \\[5pt] & = \frac{1}{\nu_2!}n^{(1-y)\lambda/b}\bigg(\bigg(\frac{1-y}{b}\bigg)\ln(n)\bigg)^{\nu_2} \big(\textbf{P}_\lambda\textbf{N}^{\nu_2}_\lambda + \textbf{o}(1)\big). \end{align*}

We reach similar conclusions when using the eigenvalue $\varrho$ .

Since $\textrm{Re}(\lambda)+\textrm{Re}(\varrho) = b$ , we define $\delta = \textrm{Im}(\lambda)+\textrm{Im}(\varrho)$ , and so $\lambda+\varrho = b+\delta\textrm{i}\mkern1mu$ . Then, we have

(27)

\begin{equation} \frac{n^{y-1}}{(\ln(n))^{2\nu_2}}\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho} = \frac{1}{(\nu_2!)^2}n^{({1}/{b})(1-y)\delta\textrm{i}\mkern1mu}\bigg(\frac{1-y}{b}\bigg)^{2\nu_2} \big(\textbf{N}^\top_\lambda\big)^{\nu_2}\textbf{P}_\lambda^\top\textbf{B}\textbf{P}_\varrho\textbf{N}^{\nu_2}_\varrho + \textbf{o}(1). \end{equation}

From Lemma 5, for $y\in (0,1]$ and $n\ge 2$ ,

\begin{equation*} \frac{n^{y-1}}{\ln^{2\nu_2}(n)}\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho} = \textbf{O}\bigg(\frac{n}{\lceil n^y\rceil}\bigg)n^{y-1} = \textbf{O}(1). \end{equation*}

Thus, the error bound $\textbf{o}(1)$ in (27) is also uniformly bounded, and so we apply Lebesgue dominated convergence to the new integral to obtain

(28)

\begin{multline} \int_0^1\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho}n^{y-1}(\ln(n))^{-2\nu_2}\,{\textrm{d}} y \\[5pt] = \frac{1}{(\nu_2!)^2}\bigg(\int_0^1n^{({1}/{b})(1-y)\delta\textrm{i}}\bigg(\frac{1-y}{b}\bigg)^{2\nu_2}\,{\textrm{d}} y\bigg) \big(\textbf{N}^\top_\lambda\big)^{\nu_2}\textbf{P}_\lambda^\top\textbf{B}^{(\textbf{A})}\textbf{P}_\varrho \textbf{N}^{\nu_2}_\varrho + \textbf{o}(1). \end{multline}

If $\delta=0$ $(\bar{\varrho}={\lambda})$ , then the integral in (28) simplifies to $b^{-2\nu_2}(2\nu_2+1)^{-1}$ . Furthermore, this situation yields $\textbf{P}_\lambda=\textbf{P}_{\bar{\varrho}}=\bar{\textbf{P}}_\varrho$ , and so $\textbf{P}_\lambda^\top=\textbf{P}_\varrho^*$ . In similar fashion, we get $\textbf{N}_\lambda^\top=\textbf{N}_\varrho^*$ , and thus (28) becomes

\begin{equation*} \frac{1}{\ln^{2\nu_2}(n)}\int_0^1\textbf{T}_{\lceil n^y\rceil,n,\bar{\varrho},\varrho}n^{y-1}\,{\textrm{d}} y = \frac{1}{b^{2\nu_2}(2\nu_2+1)(\nu_2!)^2}(\textbf{N}_\varrho^*)^{\nu_2}\textbf{P}_\varrho^*\textbf{B} \textbf{P}_\varrho\textbf{N}_\varrho^{\nu_2} + \textbf{o}(1). \end{equation*}

If $\delta \ne 0$ , then, by setting $u=1-y$ , we have

\begin{equation*} \int_0^1 n^{({1}/{b})(1-y)\delta\textrm{i}}\bigg(\frac{1-y}{b}\bigg)^{2\nu_2}\,{\textrm{d}} y = {b}^{-2\nu_2}\int_0^1{\textrm{e}}^{(({1}/{b})\delta\ln(n)\textrm{i})u}u^{2\nu_2}\,{\textrm{d}} u \to 0 \end{equation*}

as $n\to\infty$ , via integration by parts. Hence, when $\lambda \ne \bar{\varrho}$ , we get

\begin{equation*} \frac{1}{\ln^{2\nu_2}(n)}\int_0^1\textbf{T}_{\lceil n^y\rceil,n,\lambda,\varrho}n^{y-1}\,{\textrm{d}} y = \textbf{o}(1). \end{equation*}

As mentioned earlier, we may ignore pairs where $\textrm{Re}(\lambda) \le b/2$ or $\textrm{Re}(\varrho) \le b/2$ . We have now shown that cases of pairs such that $\textrm{Re}(\lambda)=\textrm{Re}(\varrho)=b/2$ but $\lambda\ne\bar{\varrho}$ are asymptotically negligible as well. Therefore, we get

\begin{equation*} \frac{1}{n\ln^{2\nu_2+1}(n)}\mathbb{C}\textrm{ov}[\textbf{X}_n] \to \frac{1}{\lambda_1^{2\nu_2}(2\nu_2+1)(\nu_2!)^2}\sum_{\textrm{Re}(\lambda)=\lambda_1/2} \big(\textbf{N}^*_\lambda\big)^{\nu_2}\textbf{P}^*_\lambda\textbf{B}\textbf{P}_\lambda\textbf{N}^{\nu_2}_\lambda, \end{equation*}

as desired.

5.3. A martingale multivariate central limit theorem

Theorem 4. Consider an irreducible affine (k,s,b)-urn scheme. As $n\to\infty$ ,

$$\frac{1}{\sqrt{\xi_n}\,}(\textbf{X}_{n}-b\textbf{v}_1n)\ {\buildrel \textrm{D} \over \longrightarrow}\ {\mathcal{N}}_k(\textbf{0},{\boldsymbol\Sigma}_\infty),$$

where

\begin{align*} {\xi_n} & = \begin{cases} n & \text{for } \Lambda < \frac{1}{2}, \\[5pt] n\ln^{2\nu_2+1}(n) & \text{for } \Lambda = \frac{1}{2}; \end{cases} \\[5pt] {\boldsymbol\Sigma}_\infty & = \begin{cases} {\boldsymbol\Sigma}_{(\textbf{A})} & \text{for } \Lambda < \frac{1}{2}, \\[5pt] \dfrac{b^{-2\nu_2}}{(2\nu_2+1)(\nu_2!)^2}\sum_{\textrm{Re}(\lambda)=b/2}\big(\textbf{N}^*_\lambda\big)^{\nu_2} \textbf{P}^*_\lambda\textbf{B}\textbf{P}_\lambda\textbf{N}^{\nu_2}_\lambda & \text{for } \Lambda = \frac{1}{2}. \end{cases} \end{align*}

Proof. The proof draws from the construction of $\textbf{Y}_n$ . By construction and (6), $\textbf{Y}_i$ is a martingale difference sequence, and thus so is $\textbf{Y}_i\textbf{F}_{i,n}$ . Furthermore, by (15), the sum of our martingale differences leads to $(\textbf{X}_n - \boldsymbol{\mu}_n)$ .

To prove the conditional Lindeberg condition, choose $\varepsilon>0$ and rewrite $\textbf{Y}_i\textbf{F}_{i,n}$ as $\textbf{Y}_i\textbf{F}_{i,n} = (\textbf{X}_i-\boldsymbol{\mu}_i) - (\textbf{X}_{i-1}-\boldsymbol{\mu}_{i-1})$ . Then, we have

\begin{align*} \bigg\|\frac{1}{\sqrt{\xi_n}\,}\textbf{Y}_i\textbf{F}_{i,n}\bigg\|_{\mathcal{L}_2} & = \frac{1}{\sqrt{\xi_n}\,}\big\|(\textbf{X}_i-\boldsymbol{\mu}_i)-(\textbf{X}_{i-1}-\boldsymbol{\mu}_{i-1})\big\|_{\mathcal{L}_2} \\[5pt] & = \frac{1}{\sqrt{\xi_n}\,}\bigg|\frac{1}{s}\textbf{Q}_i\textbf{A} + {\textbf{O}}\big(1+n^{\textrm{Re}(\lambda_2)/b}\ln^{\nu_2}(n)\big)\bigg|_{\mathcal{L}_2} \to 0, \end{align*}

since componentwise we have $\textbf{Q}_i \le s\textbf{1}$ . Based on this, there exists a natural number $n_0(\varepsilon)$ such that $\{\|\textbf{Y}_i\textbf{F}_{i,n}\|_{\mathcal{L}_2} > \xi_n\varepsilon\}$ is empty for all $n > n_0(\varepsilon)$ . Thus,

\begin{equation*} \sum_{i=1}^{n}\mathbb{E}\bigg[\frac{1}{\xi_n}(\textbf{Y}_i\textbf{F}_{i,n})^\top(\textbf{Y}_i\textbf{F}_{i,n}) 1_{\{\|\xi_n^{-{1}/{2}}\textbf{Y}_i\textbf{F}_{i,n}\|_{\mathcal{L}_2}>\varepsilon\}} \mid \mathbb{F}_{i-1}\bigg] \buildrel \textrm{a.s.} \over \longrightarrow 0, \end{equation*}

which implies convergence in probability as well. Lindeberg’s conditional condition is verified.

By Lemma 7, Theorem 3, and based on the construction of ${\mathcal Q}$ , we have

\begin{equation*} \sum_{i=1}^{n}\mathbb{E}\bigg[\frac{1}{\xi_n}(\textbf{Y}_i\textbf{F}_{i,n})^\top(\textbf{Y}_i\textbf{F}_{i,n}) \mid \mathbb{F}_{i-1}\bigg] \buildrel \mathcal{L}_{1} \over\longrightarrow {\boldsymbol\Sigma}_\infty , \end{equation*}

which implies convergence in probability. By the martingale central limit theorem (applied from [Reference Hall and Heyde9, pp. 57--59]),

$$\frac{1}{\sqrt{\xi_n}\,}\sum_{i=1}^{n}\textbf{Y}_i\textbf{F}_{i,n} = \frac{1}{\sqrt{\xi_n}\,}(\textbf{X}_{n}-\boldsymbol{\mu}_n)\ {\buildrel \textrm{D} \over \longrightarrow} \ {\mathcal{N}}_k (\textbf{0},\boldsymbol{\Sigma}_\infty).$$

Applying Slutsky’s theorem [Reference Slutsky28] to $\xi_n^{-{1}/{2}}\textbf{O}\big(1+n^{\textrm{Re}(\lambda_2)/b}\ln^{\nu_2}(n)\big)$ , we get the result:

\begin{equation*} \frac{1}{\sqrt{\xi_n}\,}(\textbf{X}_{n}-b\textbf{v}_1n)\ {\buildrel \textrm{D} \over \longrightarrow}\ {\mathcal{N}}_k(\textbf{0},\boldsymbol{\Sigma}_\infty). \end{equation*}

We conclude this section with a strong law for the composition vector. The details of the proof are identical to those found in [Reference Janson14, p. 19].

Theorem 5. For an affine irreducible (k,s,b)-urn scheme, as $n\to\infty$ , $({1}/{n})\textbf{X}_n \buildrel \textrm{a.s.} \over \longrightarrow \lambda_1\textbf{v}_1$ .

6. Comments on large-index urns

As mentioned in [Reference Chauvin, Pouyanne and Sahnoun4, Reference Janson12, Reference Janson14] and others, $\textbf{X}_n$ possesses an almost-sure expansion based on spectral decomposition of $\textbf{A}$ . Unfortunately, deriving the covariance for large-index urns becomes trickier, as it may depend on the initial condition of the urn.

However, by using the recursion (4), we have a recursive relationship for the covariance that may be used regardless of core index size.

Corollary 2. For a linear k-color urn model with unordered multiple drawings, the covariance matrix $\boldsymbol{\Sigma}_n = \mathbb{E}[(\textbf{X}_n-\boldsymbol{\mu}_{n})^\top(\textbf{X}_n-\boldsymbol{\mu}_{n})]$ satisfies the following recurrence relation:

\begin{align*} \boldsymbol{\Sigma}_n & = \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg)^\top\boldsymbol{\Sigma}_{n-1} \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg) \\[5pt] & \quad - \bigg(\frac{\tau_{n-1}-s}{s\tau_{n-1}^2(\tau_{n-1}-1)}\bigg)\textbf{A}^\top \big(\boldsymbol{\Sigma}_{n-1}+\boldsymbol{\mu}_{n-1}^\top\boldsymbol{\mu}_{n-1}-\tau_{n-1}\textbf{diag}(\boldsymbol{\mu}_{n-1})\big) \textbf{A}. \end{align*}

Proof. We first derive a recursion for $\mathbb{E}[\textbf{X}_n^\top\textbf{X}_n \mid \mathbb{F}_{n-1}]$ and then compute $\boldsymbol{\Sigma}_n = \mathbb{E}[\textbf{X}_n^\top\textbf{X}_n]-\boldsymbol{\mu}_{n}^\top\boldsymbol{\mu}_{n}$ .

To begin, consider $\mathbb{E}[\textbf{X}_n^\top\textbf{X}_n \mid \mathbb{F}_{n-1}]$ . Using (4), we have

\begin{align*} \mathbb{E}[\textbf{X}^\top_n\textbf{X}_n \mid \mathbb{F}_{n-1}] & = \mathbb{E} \bigg[\bigg(\textbf{X}_{n-1}+\frac{1}{s}\textbf{Q}_n\textbf{A}\bigg)^\top \bigg(\textbf{X}_{n-1}+\frac{1}{s}\textbf{Q}_n\textbf{A}\bigg) \mid \mathbb{F}_{n-1}\bigg] \\[5pt] & \!= \textbf{X}_{n-1}^\top\textbf{X}_{n-1} + \frac{1}{\tau_{n-1}}\big(\textbf{A}^\top\textbf{X}_{n-1}^\top\textbf{X}_{n-1} + \textbf{X}_{n-1}^\top\textbf{X}_{n-1}\textbf{A}\big) + \!\frac{1}{s^{2}}\textbf{A}\mathbb{E}\big[\textbf{Q}_n^\top\textbf{Q}_n \mid \mathbb{F}_{n-1}\big]\textbf{A}. \end{align*}

We then substitute (25) into here to obtain

\begin{align*} \mathbb{E}[\textbf{X}^\top_n\textbf{X}_n \mid \mathbb{F}_{n-1}] & = \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg)^\top\textbf{X}_{n-1}^\top\textbf{X}_{n-1} \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg) \\[5pt] & \quad - \bigg(\frac{\tau_{n-1}-s}{s\tau_{n-1}^2(\tau_{n-1}-1)}\bigg)\textbf{A}^\top \big(\textbf{X}_{n-1}^\top\textbf{X}_{n-1}-\tau_{n-1}\textbf{diag}(\textbf{X}_{n-1})\big)\textbf{A}. \end{align*}

From here, we take the expected value of both sides and use $\mathbb{E}\big[\textbf{X}_j^\top\textbf{X}_j\big]=\boldsymbol{\Sigma}_j+\boldsymbol{\mu}_{j}^\top\boldsymbol{\mu}_{j}$ as well as the relationship $\boldsymbol{\mu}_j = \boldsymbol{\mu}_{j-1}(\textbf{I} + ({1}/{\tau_{j-1}})\textbf{A})$ to get

\begin{align*} \boldsymbol{\Sigma}_n & = \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg)^\top \big(\boldsymbol{\Sigma}_{n-1}+\boldsymbol{\mu}_{n-1}^\top\boldsymbol{\mu}_{n-1}\big) \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg) \\[5pt] & \quad - \bigg(\frac{\tau_{n-1}-s}{s\tau_{n-1}^2(\tau_{n-1}-1)}\bigg)\textbf{A}^\top \big(\boldsymbol{\Sigma}_{n-1}+\boldsymbol{\mu}_{n-1}^\top\boldsymbol{\mu}_{n-1}-\tau_{n-1}\textbf{diag}(\boldsymbol{\mu}_{n-1})\big) \textbf{A} \\[5pt] & \quad - \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg)^\top\boldsymbol{\mu}_{n-1}^\top\boldsymbol{\mu}_{n-1} \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg) \\[5pt] & = \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg)^\top\boldsymbol{\Sigma}_{n-1} \bigg(\textbf{I}+\frac{1}{\tau_{n-1}}\textbf{A}\bigg) \\[5pt] & \quad - \bigg(\frac{\tau_{n-1}-s}{s\tau_{n-1}^2(\tau_{n-1}-1)}\bigg)\textbf{A}^\top \big(\boldsymbol{\Sigma}_{n-1}+\boldsymbol{\mu}_{n-1}^\top\boldsymbol{\mu}_{n-1}-\tau_{n-1}\textbf{diag}(\boldsymbol{\mu}_{n-1})\big) \textbf{A}. \end{align*}

7. Growth under sampling with replacement

If the sample is drawn with replacement, we get results that are very similar to the case of sampling without replacement, with very similar proof techniques. So, we only point out the salient points of difference and very succinctly describe the main result.

To create a notational distinction between the without-replacement and the with-replacement sampling schemes, we use tilded variables to refer to the counterparts in the without-replacement schemes. Letting $\tilde{\textbf{Q}}_n$ be the sample drawn with replacement at step n, we get a multinomial distribution with conditional probability

\begin{equation*} \mathbb{P}\big(\tilde{\textbf{Q}}_{n}=(s_1,\ldots,s_k) \mid \tilde{\mathbb{F}}_{n-1}\big) = \begin{pmatrix} m \cr s_1, s_2, \ldots, s_{k} \end{pmatrix} \frac{\tilde{X}_{n-1,1}^{s_1} \cdots \tilde{X}_{n-1,k}^{s_k}}{\tau_{n-1}^{m}}.\end{equation*}

The composition vector in this case has a mean value identical to that in the case of sampling without replacement. The covariance matrix develops slightly differently from that of sampling without replacement, but remains of the same order as that described in (20) and can be solved via Corollary 2, but with

$$\frac{1}{s\tau_{n-1}^2} \quad \textrm{substituting for} \quad\frac{\tau_{n-1}-s}{s \tau_{n-1}^2(\tau_{n-1}-1)}.$$

Furthermore, for small- and critical-index urns, a central limit theorem follows identically to Theorem 4.

8. Examples

We give four examples of (k, s, b)-urns. The first two are small urns, one with a diagonalizable core and one with a nondiagonalizable core. We work out all the details in Example 4, and portray a more sketchy picture in the rest. Example 5 provides a useful application. Example 6 focuses on a critical urn, and we finish with a brief mention of the behavior of a large-index urn in Example 7.

Example 4. (Small diagonalizable core.) Consider an affine urn scheme with $s=2$ draws per sample and irreducible core

$$\textbf{A} = \begin{pmatrix} 6\;\;\;\;\; & 4\;\;\;\;\; & 6 \\[5pt] 2\;\;\;\;\; & 6\;\;\;\;\; & 8 \\[5pt] 4\;\;\;\;\; & 6\;\;\;\;\; & 6 \end{pmatrix}.$$

This is a (3,2,16)-urn. Theorem 1 completes the replacement matrix to

\begin{equation*} \textbf{M} = \begin{matrix} 200 \\[5pt] 110 \\[5pt] 101\\[5pt] 020 \\[5pt] 011 \\[5pt] 002 \end{matrix} \begin{pmatrix} 6\;\;\;\;\; & 4\;\;\;\;\; & 6 \\[5pt] 4\;\;\;\;\; & 5\;\;\;\;\; & 7 \\[5pt] 5\;\;\;\;\; & 5\;\;\;\;\; & 6 \\[5pt] 2\;\;\;\;\; & 6\;\;\;\;\; & 8 \\[5pt] 3\;\;\;\;\; & 6\;\;\;\;\; & 7 \\[5pt] 4\;\;\;\;\; & 6\;\;\;\;\; & 6 \end{pmatrix}. \end{equation*}

Suppose the urn starts in $\textbf{X}_0= (4,3,5)$ . With $n=2000$ , the computation in Corollary 1 gives

$$\frac{1}{2000}\textbf{X}_{2000} \approx (3.787, 5.527, 6.692).$$

The eigenvalues of $\textbf{A}$ are $\lambda_1=16$ , $\lambda_2=1+\sqrt{5}$ , and $\lambda_3=1-\sqrt{5}$ . With $\lambda_2 < \lambda_1/2$ , this is a small-index case. The principal left eigenvector is $\textbf{v}_1 = \big(\frac{13}{55}, \frac{19}{55}, \frac{23}{55}\big)$ . As $n\to\infty$ , we have

\begin{equation*} \frac{1}{n}\mathbb{E}[\textbf{X}_n] \to \boldsymbol{\mu}_\infty = 16\textbf{v}_1 = \bigg(\frac{208}{55},\frac{304}{55},\frac{362}{55}\bigg) \approx (3.782, 5.527, 6.691). \end{equation*}

We apply Theorem 3 to obtain $\boldsymbol{\Sigma}_\infty$ . We note that $\textbf{A}$ is diagonalizable, and so we may write $\textbf{A}=\textbf{T}\textbf{diag}(\lambda_1,\lambda_2,\lambda_3)\textbf{T}^{-1}$ , where

$$\textbf{T} = \frac{1}{2}\left( \begin{array}{c@{\quad}c@{\quad}c} 2 & -19\sqrt{5}-43 & 19\sqrt{5}-43 \\[5pt] 2 & \hphantom{-}13\sqrt{5}+27 & 27-13\sqrt{5} \\[5pt] 2 & \hphantom{-}2 & 2 \\[5pt] \end{array} \right). $$

Set $\textbf{P}_{\lambda_1}=\textbf{1}^\top\textbf{v}_1$ , and take $\textbf{B}$ as in (13). Let ${\textrm{e}}^{s\textbf{A}}=\textbf{T}\textbf{diag}({\textrm{e}}^{16s},{\textrm{e}}^{(1+\sqrt{5})s},{\textrm{e}}^{(1+\sqrt{5})s})\textbf{T}^{-1}$ . Then,

\begin{equation*} \boldsymbol{\Sigma}_{(\textbf{A})} = 16\int_0^\infty{\textrm{e}}^{s\textbf{A}^\top}\hat{\textbf{P}}^\top\textbf{B}\hat{\textbf{P}} {\textrm{e}}^{s\textbf{A}}{\textrm{e}}^{-16s}\,{\textrm{d}} s. \end{equation*}

We apply Theorem 4, and conclude that

$$ \frac{1}{\sqrt{n}\,}\bigg(\textbf{X}_{n}-\bigg(\frac{208}{55},\frac{304}{55},\frac{362}{55}\bigg)\bigg)\ {\buildrel \textrm{D} \over \longrightarrow}\ {\mathcal{N}}_3 \left(\textbf{0},\, \frac{1}{3025} \begin{pmatrix} \hphantom{-}5552\;\;\;\;\; & -2864\;\;\;\;\; & -2688 \\[5pt] -2864\;\;\;\;\; & \hphantom{-}1808\;\;\;\;\; & \hphantom{-}1056 \\[5pt] -2688\;\;\;\;\; & \hphantom{-}1056\;\;\;\;\; & \hphantom{-}1632 \\[5pt] \end{pmatrix}\right). $$

Example 5. (Small nondiagonalizable core.) A new series of weekly art events is about to begin in the city. You and three of your friends decide to go together to the kick-off ceremony, but for the first official event, one friend cannot make it. To retain the group size at four, a new person is invited to join the group for the next event. For each subsequent event this process repeats, where three of the prior attenders group together with a new person to attend. Former attenders may return to each new event, and the group can be formed via individuals who are linked by an outside acquaintance (so, the network does not necessarily need to include someone who knows all attending members).

To estimate the total number of individuals from the collective that have attended altogether one, two, three, and at least four of these events over a long run, we may apply an affine urn model. Note that we begin with four individuals who have attended one event (the kick-off), and with each new art event we sample three former attendees to return with a new member. We always add one person to the group who has attended only one event (the new member), and if a former member is selected to attend the new event, they leave the list of those who attended i events to the list of those who have attended $(i+1)$ events, unless they have attended at least four events, in which case they remain in this category. Thus, the affine model with $s=3$ drawings has an initial state of $\textbf{X}_0=(4,0,0,0)$ and the core matrix

$$\textbf{A}=\begin{pmatrix} -2\;\;\;\;\; & 3\;\;\;\;\; & 0\;\;\;\;\; & 0 \\[5pt] 1\;\;\;\;\; & -3\;\;\;\;\; & 3\;\;\;\;\; & 0 \\[5pt] 1\;\;\;\;\; & 0\;\;\;\;\; & -3\;\;\;\;\; & 3 \\[5pt] 1\;\;\;\;\; & 0\;\;\;\;\; & 0\;\;\;\;\; & 0 \end{pmatrix}.$$

Here, we have an irreducible affine (4,3,1)-urn. The eigenvalues of $\textbf{A}$ are $\lambda_1=1$ and $\lambda_\ell=-3$ for $\ell\ge2$ . With $\lambda_2 < \lambda_1 / 2$ , this is a small urn. The leading left eigenvector is $\textbf{v}_1=\big(\frac{1}{4},\frac{3}{16},\frac{9}{64},\frac{27}{64}\big)$ . Since $\textbf{A}$ is 1-balanced, we have $\boldsymbol{\mu}_{\infty}=\big(\frac{1}{4},\frac{3}{16},\frac{9}{64},\frac{27}{64}\big)$ .

Note that $\textbf{A}$ is not diagonalizable, so the analysis requires the Jordan decomposition $\textbf{A}=\textbf{T}\boldsymbol{J}\textbf{T}^{-1}$ , where

$$\textbf{T}= \begin{pmatrix} 1\;\;\;\;\; & -3\;\;\;\;\; & 1\;\;\;\;\; & 0 \\[5pt] 1\;\;\;\;\; & 1\;\;\;\;\; & -\frac{4}{3}\;\;\;\;\; & \frac{1}{3} \\[5pt] 1\;\;\;\;\; & 1\;\;\;\;\; & 0\;\;\;\;\; & -\frac{4}{9} \\[5pt] 1\;\;\;\;\; & 1\;\;\;\;\; & 0\;\;\;\;\; & 0 \\[5pt] \end{pmatrix}, \qquad \boldsymbol{J}= \begin{pmatrix} 1\;\;\;\;\; & 0\;\;\;\;\; & 0\;\;\;\;\; & 0 \\[5pt] 0\;\;\;\;\; & -3\;\;\;\;\; & 1\;\;\;\;\; & 0 \\[5pt] 0\;\;\;\;\; & 0\;\;\;\;\; &-3\;\;\;\;\; &1 \\[5pt] 0\;\;\;\;\; & 0\;\;\;\;\; & 0\;\;\;\;\; &-3 \end{pmatrix}.$$

Again, we set $\textbf{P}_{\lambda_1}=\textbf{1}_k^\top\textbf{v}_1$ and $\textbf{B}$ as in (13), and let

$${\textrm{e}}^{s\textbf{A}}=\textbf{T} \begin{pmatrix} {\textrm{e}}^s\;\;\;\;\; & 0\;\;\;\;\; & 0\;\;\;\;\; & 0 \\[5pt] 0\;\;\;\;\; & {\textrm{e}}^{-3s}\;\;\;\;\; & {\textrm{e}}^{-3s}s\;\;\;\;\; & \frac{1}{2}{\textrm{e}}^{-3s}s^2 \\[5pt] 0\;\;\;\;\; & 0\;\;\;\;\; & {\textrm{e}}^{-3s}\;\;\;\;\; & {\textrm{e}}^{-3s}s \\[5pt] 0\;\;\;\;\; & 0\;\;\;\;\; & 0\;\;\;\;\; & {\textrm{e}}^{-3s} \end{pmatrix} \textbf{T}^{-1}.$$

We apply Theorem 3 to get the limiting matrix $\boldsymbol{\Sigma}_\infty$ . We apply Theorem 4 to conclude that

\begin{multline*} \frac{1}{\sqrt{n}\,} \bigg(\textbf{X}_{n}-\bigg(\frac{1}{4},\frac{3}{16},\frac{9}{64},\frac{27}{64}\bigg)n\bigg) \\[5pt] {\buildrel \textrm{D} \over \longrightarrow} \ {\mathcal{N}}_4\left(\textbf{0}, \begin{pmatrix} \dfrac{9}{112}\;\;\;\;\; & -\dfrac{207}{3136}\;\;\;\;\; & -\dfrac{2043}{87\,808}\;\;\;\;\; & \dfrac{783}{87\,808} \\[12pt] -\dfrac{207}{3136}\;\;\;\;\; & \dfrac{11\,349}{87\,808}\;\;\;\;\; & -\dfrac{88\,983}{2\,458\,624}\;\;\;\;\; & -\dfrac{66\,501}{2\,458\,624} \\[12pt] -\dfrac{2043}{87\,808}\;\;\;\;\; & -\dfrac{88\,983}{2\,458\,624}\;\;\;\;\; & \dfrac{7\,480\,413}{68\,841\,472}\;\;\;\;\; & -\dfrac{3\,387\,177}{68\,841\,472} \\[12pt] \dfrac{783}{87\,808}\;\;\;\;\; & -\dfrac{66\,501}{2\,458\,624}\;\;\;\;\; & -\dfrac{3\,387\,177}{68\,841\,472}\;\;\;\;\; & \dfrac{4\,635\,333}{68\,841\,472} \end{pmatrix}\right). \end{multline*}

Example 6. (Critical core.) Consider an affine urn model with $s=2$ and replacement matrix with the core

$$\textbf{A}= \begin{pmatrix} 4\;\;\;\;\; & 0\;\;\;\;\; & 2 \\[5pt] 2\;\;\;\;\; & 4\;\;\;\;\; & 0 \\[5pt] 0\;\;\;\;\; & 2\;\;\;\;\; & 4 \end{pmatrix}.$$

The eigenvalues of $\textbf{A}$ are $\lambda_1=6$ , $\lambda_2=3+\textrm{i}\mkern1mu\sqrt{3}$ , and $\lambda_3=3-\textrm{i}\mkern1mu\sqrt{3}$ . Here, the core index is $\frac 1 2$ , and thus this urn is critical. The principal left eigenvector is $\textbf{v}_1=\frac{1}{3}\textbf{1}$ . For large n, we have $({1}/{n})\mathbb{E}[\textbf{X}_n] \to 6\textbf{v}_1 = (2, 2, 2)$ .

Note that $\textbf{A}$ is diagonalizable, and so $\nu_2=0$ and we have

\begin{align*} \textbf{P}_{\lambda_2} & = \frac{1}{6}\left( \begin{array}{c@{\quad}c@{\quad}c} 2 & -1-\textrm{i}\mkern1mu\sqrt{3} & -1+\textrm{i}\mkern1mu\sqrt{3} \\[5pt] -1+\textrm{i}\mkern1mu\sqrt{3} & 2 & -1-\textrm{i}\mkern1mu\sqrt{3} \\[5pt] -1-\textrm{i}\mkern1mu\sqrt{3} & -1+\textrm{i}\mkern1mu\sqrt{3} & 2 \\[5pt] \end{array} \right), \\[5pt] \textbf{P}_{\lambda_3} & = \frac{1}{6}\left( \begin{array}{c@{\quad}c@{\quad}c} 2 & -1+\textrm{i}\mkern1mu\sqrt{3} & -1-\textrm{i}\mkern1mu\sqrt{3} \\[5pt] -1-\textrm{i}\mkern1mu\sqrt{3} & 2 & -1+\textrm{i}\mkern1mu\sqrt{3} \\[5pt] -1+\textrm{i}\mkern1mu\sqrt{3} & -1-\textrm{i}\mkern1mu\sqrt{3} & 2 \\[5pt] \end{array} \right). \end{align*}

By Theorem 3, we get $({1}/({n\ln(n)}))\boldsymbol{\Sigma}_n \to \boldsymbol{\Sigma}_{\infty} = \textbf{P}^*_{\lambda_2}\textbf{B}\textbf{P}_{\lambda_2}+\textbf{P}^*_{\lambda_3}\textbf{B}\textbf{P}_{\lambda_3}$ . Applying Theorem 4, we conclude that

$$\frac{1}{n\ln(n)}(\textbf{X}_{n}-(2,2,2)\, n)\ {\buildrel \textrm{D} \over \longrightarrow}\ {\mathcal{N}}_3\left(\textbf{0},\frac{1}{3} \begin{pmatrix} 4\;\;\;\;\; & -2\;\;\;\;\; & -2 \\[5pt] -2\;\;\;\;\; & 4\;\;\;\;\; & -2 \\[5pt] -2\;\;\;\;\; & -2\;\;\;\;\; & 4 \end{pmatrix}\right). $$

Example 7. (Large core.) Consider an affine urn scheme with $s=3$ draws per sample and irreducible core

$$\textbf{A}= \begin{pmatrix} 9\;\;\;\;\; & 3\;\;\;\;\; & 0 \cr 0\;\;\;\;\; & 9\;\;\;\;\; & 3 \cr 3\;\;\;\;\; & 0\;\;\;\;\; & 9 \end{pmatrix}.$$

This is a (3,3,12)-urn. Suppose the urn starts in $\textbf{X}_0= (3,2,2)$ . With $n=2000$ , the computation in Corollary 1 gives $({1}/{2000})\textbf{X}_{2000} \approx {(3.992, 4.062, 3.947)}$ .

The eigenvalues of $\textbf{A}$ here are $\lambda_1=12$ , $\lambda_2=7.5+3\textrm{i}\mkern1mu\sqrt{\frac{3}{4}}$ , and $\lambda_3=7.5-3\textrm{i}\mkern1mu\sqrt{\frac{3}{4}}$ . With $\textrm{Re}\;\lambda_2 > \lambda_1/2$ , this is a large-index case. The principal left eigenvector is $\textbf{v}_1 =\frac{1}{3} \textbf{1}$ . As $n \to \infty$ , we have $({1}/{n})\mathbb{E}[\textbf{X}_n] \to \boldsymbol{\mu}_\infty = 12\textbf{v}_1 = (4, 4, 4)$ .

Acknowledgements

We wish to thank the referee for their thorough examination of the manuscript, which led to a richer analysis on the connection between urn irreducibility and matrix-theory irreducibility.

Funding information

There are no funding bodies to thank relating to the creation of this article.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Bailey, W. (1935). Generalised Hypergeometric Series. Cambridge University Press.Google Scholar

Bandyopadhyay, A. and Thacker, D. (2017). Pólya urn schemes with infinitely many colors. Bernoulli 23, 3243–3267.CrossRef Google Scholar

Bose, A., Dasgupta, A. and Maulik, K. (2009). Multicolor urn models with reducible replacement matrices. Bernoulli 15, 279–295.CrossRef Google Scholar

Chauvin, B., Pouyanne, N. and Sahnoun, R. (2011). Limit distributions for large Pólya urns. Ann. Appl. Prob. 21, 1–32.CrossRef Google Scholar

Chen, M. and Kuba, M. (2013). On generalized Pólya urn models. J. Appl. Prob. 50, 1169–1186.CrossRef Google Scholar

Chen, M. and Wei, C. (2005). A new urn model. J. Appl. Prob. 42, 964–976.CrossRef Google Scholar

Crimaldi, I., Louis, P. and Minelli, G. (2023). Statistical test for an urn model with random multidrawing and random addition. Stoch. Process. Appl. 158, 342–360.CrossRef Google Scholar

Graham, R., Knuth, D. and Patashnik, O. (1989). Concrete Mathematics: A Foundation for Computer Science, 2nd edn. Addison-Wesley, Reading, MA.Google Scholar

Hall, P. and Heyde, C. (1980). Martingale Limit Theory and Its Application. Academic Press, Inc., New York.Google Scholar

Hill, B., Lane, D. and Sudderth, W. (1980). A strong law for some generalized urn processes. Ann. Prob. 8, 214–226.CrossRef Google Scholar

Horn, R. and Johnson, C. (2013). Matrix Analysis, 2nd edn. Cambridge University Press.Google Scholar

Janson, S. (2004). Functional limit theorems for multitype branching processes and generalized Pólya urns. Stoch. Process. Appl. 110, 177–245.CrossRef Google Scholar

Janson, S. (2019). Random replacements in Pólya urns with infinitely many colours. Electron. Commun. Prob. 24, 1–11.CrossRef Google Scholar

Janson, S. (2020). Mean and variance of balanced Pólya urns. Adv. Appl. Prob. 52, 1224–1248.CrossRef Google Scholar

Johnson, N. and Kotz, K. (1977). Urn Models and Their Applications: An Approach to Modern Discrete Probability Theory. John Wiley, New York.Google Scholar

Kendall, M., Stuart, A. and Ord, K. (1987). Advanced Theory of Statistics, Vol. I, Distribution Theory. Oxford University Press.Google Scholar

Knuth, D. (2005). The Art of Computer Programming, Vol. 4, Fascicle 2, Generating All Tuples and Permutations. Addison-Wesley, Upper Saddle River, NJ.Google Scholar

Konzem, S. and Mahmoud, H. (2016). Characterization and enumeration of certain classes of Pólya urns grown by drawing multisets of balls. Methodology Comput. Appl. Prob. 18, 359–375.CrossRef Google Scholar

Kotz, S. and Balakrishnan, N. (1997). Advances in urn models during the past two decades. In Advances in Combinatorial Methods and Applications to Probability and Statistics, ed. N. Balakrishnan. Birkhäuser, Boston, MA, pp. 203–257.CrossRef Google Scholar

Kuba, M. (2016). Classification of urn models with multiple drawings. Preprint, arXiv:1612.04354 [math.PR].Google Scholar

Kuba, M. and Mahmoud, H. (2017). Two-color balanced affine urn models with multiple drawings. Adv. Appl. Math. 90, 1–26.CrossRef Google Scholar

Kuba, M., Mahmoud, H. and Panholzer, A. (2013). Analysis of a generalized Friedman’s urn with multiple drawings. Discrete Appl. Math. 161, 2968–2984.CrossRef Google Scholar

Lasmar, N., Mailler, C. and Selmi, O. (2018). Multiple drawing multi-colour urns by stochastic approximation. J. Appl. Prob. 55, 254–281.CrossRef Google Scholar

Mahmoud, H. (2009). Pólya Urn Models. CRC Press, Boca Raton, FL.Google Scholar

Mahmoud, H. (2013). Drawing multisets of balls from tenable balanced linear urns. Prob. Eng. Inf. Sci. 27, 147–162.CrossRef Google Scholar

Maki, D. and Thompson, M. (1973). Mathematical Models and Applications. Prentice-Hall, Hoboken, NJ.Google Scholar

Nomizu, K. (1979). Fundamentals of Linear Algebra, 2nd edn. Chelsea Publishing, New York.Google Scholar

Slutsky, E. (1925). Über stochastische Asymptoten und Grenzwerte. Metron 5, 3–89.Google Scholar

Sparks, J. (2023). Investigating multicolor affine urn models using multiple drawings. Dissertation, The George Washington University, Washington, DC.Google Scholar

Sparks, J., Balaji, S. and Mahmoud, H. (2022). The containment profile of hyperrecursive trees. J. Appl. Prob. 59, 278–296.CrossRef Google Scholar

Article contents

On affine multicolor urns grown under multiple drawing

Abstract

Keywords

MSC classification

1. Introduction

2. The mechanics of a multicolor multiple-drawing urn scheme

3. The (k, s, b)-urn

3.1. Notation

3.2. Tools from linear algebra

3.3. The affine subclass

4. Classification of cores

5. A multivariate martingale underlying the (k, s, b)-urn class

5.1. Asymptotic mean

5.2. The covariance matrix

5.3. A martingale multivariate central limit theorem

6. Comments on large-index urns

7. Growth under sampling with replacement

8. Examples

Acknowledgements

Funding information

Competing interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests