1. Introduction
Means have fascinated man for a long time. Ancient Greeks knew the arithmetic, geometric, and harmonic means of two positive numbers (which they may have learned from the Babylonians); they also studied other types of means that can be defined using proportions: see [Reference Heath17, pp. 85–89]. Newton and Maclaurin encountered the symmetric means (more about them later). Huygens introduced the notion of expected value and Jacob Bernoulli proved the first rigorous version of the law of large numbers, see [Reference Maistrov20, pp. 51, 73]. Gauss and Lagrange exploited the connection between the arithmetico-geometric mean and elliptic functions: see [Reference Borwein and Borwein6]. Kolmogorov and other authors considered means from an axiomatic point of view and determined when a mean is arithmetic under a change of coordinates (i.e. quasiarithmetic), see [Reference Hardy, Littlewood and Pólya16, p. 157–163], [Reference Aczél and Dhombres1, Chapter 17]. Means and inequalities between them are the main theme of the classical book [Reference Hardy, Littlewood and Pólya16] by Hardy, Littlewood, and Pólya, and the book [Reference Bullen8] by Bullen is a comprehensive account of the subject. Going beyond the real line, there are notions of averaging that relate to the geometric structure of the ambient space, see e.g. [Reference Émery and Mokobodzki11, Reference Kim, Lawson and Lim18, Reference Navas22, Reference Stone25]. The paper [Reference Bennett, Holland and Székely4] contains an excellent summary of the history of means.
In this paper, we are interested in one of the most classical types of means: the elementary symmetric polynomials means, or symmetric means for short. Let us recall their definition. Given integers $n \ge k \ge 1$, the $k$th symmetric mean of a list of non-negative numbers $x_1,\,\dots,\,x_n$ is:
where $E^{(n)}_k(x_1,\,\dots,\,x_n) := \sum \nolimits _{i_1<\cdots < i_k} x_{i_1} \cdots x_{i_k}$ is the elementary symmetric polynomial of degree $k$ in $n$ variables. Note that the extremal cases $k=1$ and $k=n$ correspond to arithmetic and the geometric means, respectively. The symmetric means are non-increasing as functions of $k$: this is Maclaurin's inequality: see [Reference Hardy, Littlewood and Pólya16, p. 52] or [Reference Bullen8, p. 327]. For much more information on symmetric means and their relatives, see [Reference Bullen8, Chapter V].
Let us now turn to Probability Theory. A law of large numbers in terms of symmetric means was obtained by Halász and Székely [Reference Halász and Székely15], confirming a conjecture of Székely [Reference Székely26]. Let $X_1$, $X_2$, …be a sequence of non-negative independent identically distributed random variables, and from them, we form another sequence of random variables:
The case of $k=1$ corresponds to the setting of the usual law of large numbers. The case of constant $k>1$ is not significantly different from the classical setting. Things become more interesting if $k$ is allowed to depend on $n$, and it turns out to be advantageous to assume that $k/n$ converges to some number $c \in [0,\,1]$. In this case, Halász and Székely [Reference Halász and Székely15] have proved that if $X = X_1$ is strictly positive and satisfies some integrability conditions, then $S_n$ converges almost surely to a non-random constant. Furthermore, they gave a formula for this limit, which we call the Halász–Székely mean with parameter $c$ of the random variable $X$. Halász and Székely theorem was extended to the non-negative situation by van Es [Reference van Es28] (with appropriate extra hypotheses). The simplest example consists of a random variable $X$ that takes two non-negative values $x$ and $y$, each with probability $1/2$, and $c=1/2$; in this case, the Halász–Székely mean is $(\frac {\sqrt {x}+\sqrt {y}}{2})^{2}$. But this example is misleadingly simple, and Halász–Székely means are in general unrelated to power means.
Fixed the parameter $c$, the Halász–Székely mean of a non-negative random variable $X$ only depends on its distribution, which we regard as a probability measure $\mu$ on the half-line $[0,\,+\infty )$. Now we shift our point of view and consider probability measures as the fundamental objects. Instead of speaking of the mean of a probability measure, we prefer the word barycenter, reserving the word mean for lists of numbers (with or without weights), functions, and random variables. This is more than a lexical change. The space of probability measures has a great deal of structure: it is a convex space and it can be endowed with several topologies. So we arrive at the notion of Halász–Székely barycenter (or HS barycenter) of a probability measure $\mu$ with parameter $c$, which we denote $[\mu ]_c$. This is the subject of this paper. It turns out that HS barycenters can be defined directly, without resorting to symmetric means or laws of large numbers (see Definition 2.3).
Symmetric means are intrinsically discrete objects and do not make sense as barycenters. In [Reference Bullen8, Remark, p. 323], Bullen briefly proposes a definition of a weighted symmetric mean, only to conclude that ‘the properties of this weighted mean are not satisfactory’ and therefore, not worthy of further consideration. On the other hand, given a finite list $\underline {x} = (x_1,\,\dots,\,x_n)$ of non-negative numbers, we can compare the symmetric means of $\underline {x}$ with the HS barycenter of the associated probability measure $\mu := (\delta _{x_1}+\dots +\delta _{x_n})/n$. It turns out that these quantities obey certain precise inequalities (see Theorem 3.4). In particular, we have:
Furthermore, if $\underline {x}^{(m)}$ denotes the $nm$-tuple obtained by concatenation of $m$ copies of $\underline {x}$, then
and we have precise bounds for the relative error of this approximation, depending only on the parameters and not on the numbers $x_i$ themselves.
Being a natural limit of symmetric means, the HS barycenters deserve to be studied by their own right. One can even argue that they give the ‘right’ notion of weighted symmetric means that Bullen was looking for. HS barycenters have rich theoretical properties. They are also cheap to compute while computing symmetric means involves summing exponentially many terms.
Using our general inequalities and certain continuity properties of the HS barycenters, we are able to obtain in straightforward manner an ergodic theorem that extends the laws of large numbers of Halász–Székely [Reference Halász and Székely15] and van Es [Reference van Es28].
A prominent feature of the symmetric mean (1.1) is that it vanishes whenever more than $n-k$ of the numbers $x_i$ vanish. Consequently, the HS barycenter $[\mu ]_c$ of a probability measure $\mu$ on $[0,\,+\infty )$ vanishes when $\mu (\{0\}) > 1-c$. In other words, once the mass of leftmost point $0$ exceeds the critical value $1-c$, then it imposes itself on the whole distribution, and suddenly forces the mean to agree with it. Fortunately, in the subcritical regime, $\mu (\{0\}) < 1-c$, the HS barycenter turns out to be much better behaved. As it will be seen in § 2, in the critical case $\mu (\{0\}) = 1-c$ the HS barycenter can be either positive or zero, so the HS barycenter can actually vary discontinuously. Therefore, our regularity results and the ergodic theorem must take this critical phenomenon into account.
This article is organized as follows. In § 2, we define formally the HS barycenters and prove some of their basic properties. In § 3, we state and prove the fundamental inequalities relating HS barycenters to symmetric means. In § 4, we study the problem of continuity of the HS barycenters with respect to appropriate topologies on spaces of probability measures. In § 5, we apply the results of the previous sections and derive a general ergodic theorem (law of large numbers) for symmetric and HS means.
2. Presenting the HS barycenter
Hardy, Littlewood, and Pólya's axiomatization of (quasiarithmetic) means [Reference Hardy, Littlewood and Pólya16, § 6.19] is formulated in terms of distribution functions, using Stieltjes integrals. Since the first publication of their book in 1934, measures became established as fundamental objects in mathematical analysis, probability theory, dynamical systems, etc. Spaces of measures have been investigated in depth (see e.g. the influential books [Reference Parthasarathy23, Reference Villani29]). The measure-theoretic point of view provides the convenient structure for the analytic study of means or, as we prefer to call them in this case, barycenters. The simplest example of barycenter is of course the ‘arithmetic barycenter’ of a probability measure $\mu$ on Euclidean space $\mathbb {R}^{d}$, defined (under the appropriate integrability condition) as $\int x \, d \mu (x)$. Another example is the ‘geometric barycenter’ of a probability measure $\mu$ on the half-line $(0,\,+\infty )$, defined as $\exp ( \int \log x \, d\mu (x) )$. In this section, we introduce the Halász–Székely barycenters and study some of their basic properties.
2.1. Definitions and basic properties
Throughout this paper we use the following notation:
We routinely work with the extended line $[-\infty,\,+\infty ]$, endowed with the order topology.
Definition 2.1 The Halász–Székely kernel (or HS kernel) is the following function of three variables $x \in \mathbb {R}_+$, $y \in \mathbb {R}_{+ + }$, and $c \in [0,\,1]$:
Proposition 2.2 The HS kernel has the following properties (see also Figure 1):
(a) The function $K \colon [0,\,+\infty ) \times (0,\,+\infty ) \times [0,\,1] \to [-\infty,\,+\infty )$ is continuous, attaining the value $-\infty$ only at the points $(0,\,y,\,1)$.
(b) $K(x,\,y,\,c)$ is increasing with respect to $x$.
(c) $K(x,\,y,\,c)$ is decreasing with respect to $c,$ and strictly decreasing when $x\neq y$.
(d) $K(x,\,y,\,1) = \log x$ is independent of $y$.
(e) $K(x,\,y,\,c) \ge \log x,$ with equality if and only if $x=y$ or $c=1$.
(f) For each $y>0,$ the function $K(\mathord {\cdot },\,y,\,0)$ is affine, and its graph is the tangent line to $\log x$ at $x=y$.
(g) $K(\lambda x,\, \lambda y,\, c) = K(x,\,y,\,c) + \log \lambda,$ for all $\lambda >0$.
Proof. Most properties are immediate from Definition 2.1. To check monotonicity with respect to $c$, we compute the partial derivative when $c>0$:
since $\log t \le t-1$ (with equality only if $t=1$), we conclude that $K_c(x,\,y,\,c) \le 0$ (with equality only if $x=y$). Since $K$ is continuous, we obtain property (c). Property (e) is a consequence of properties (c) and (d).
Let $\mathcal {P}(\mathbb {R}_+ )$ denote the set of all Borel probability measures $\mu$ on $\mathbb {R}_+$. The following is the central concept of this paper:
Definition 2.3 Let $c \in [0,\,1]$ and $\mu \in \mathcal {P}(\mathbb {R}_+ )$. If $c=1$, then we require that the function $\log x$ is semi-integrableFootnote 1 with respect to $\mu$. The Halász–Székely barycenter (or HS barycenter) with parameter $c$ of the probability measure $\mu$ is:
where $K$ is the HS kernel (2.2).
First of all, let us see that the definition is meaningful:
• If $c<1$, then for all $y>0$, the function $K( \mathord {\cdot },\, y,\, c)$ is bounded from below by $K(0,\,y,\,c)>-\infty$, and therefore, it has a well-defined integral (possibly $+\infty$); so $[\mu ]_c$ is a well-defined element of the extended half-line $[0,\,+\infty ]$.
• If $c=1$, then by part (d) of Proposition 2.2, the defining formula (2.4) becomes:
(2.5)\begin{equation} [\mu]_1 = \exp \displaystyle\int \log x \, d\mu(x) . \end{equation}The integral is a well-defined element of $[-\infty,\, +\infty ]$, so $[\mu ]_1$ is well defined in $[0,\,+\infty ]$.
Formula (2.5) means that the HS barycenter with parameter $c=1$ is the geometric barycenter; let us see that $c=0$ corresponds to the standard arithmetic barycenter:
Proposition 2.4 For any $\mu \in \mathcal {P}(\mathbb {R}_+ ),$ we have $[\mu ]_0 = \int x \, d\mu (x)$.
Proof. Let $a := \int x \, d\mu (x)$. If $a = \infty$, then for every $y>0$, the non-constant affine function $K(\mathord {\cdot },\,y,\,0)$ has infinite integral, so definition (2.4) gives $[\mu ]_0 = \infty$. On the other hand, if $a<\infty$, then $[\mu ]_0$ is defined as $\exp \inf _{y>0} (\log y + a/y -1) = a$.
Let ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ denote the subset formed by those $\mu \in \mathcal {P}(\mathbb {R}_+ )$ such that:
or, equivalently, $\int \log ^{+} x \, d\mu < \infty$ (we will sometimes write ‘$d\mu$’ instead of ‘$d\mu (x)$’).
Proposition 2.5 Let $c \in (0,\,1]$ and $\mu \in \mathcal {P}(\mathbb {R}_+ )$. Then $[\mu ]_c < \infty$ if and only if $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$.
Proof. The case $c=1$ being clear, assume that $c \in (0,\,1)$. Note that for all $y>0$, the expression
is a bounded function of $x$, so the integrability of $K(x,\,y,\,c)$ and $\log (x+1)$ are equivalent.
Next, let us see that the standard properties one might expect for something called a ‘barycenter’ are satisfied. For any $x \ge 0$, we denote by $\delta _x$ the probability measure such that $\delta _x(\{x\})=1$.
Proposition 2.6 For all $c \in [0,\,1]$ and $\mu \in \mathcal {P}(\mathbb {R}_+ ),$ the following properties hold:
(a) Reflexivity: $[\delta _x]_c = x,$ for every $x \ge 0$.
(b) Monotonicity with respect to the measure: If $\mu _1,$ $\mu _2 \in \mathcal {P}(\mathbb {R}_+ )$ have distribution functions $F_1$, $F_2$ such that $F_1 \ge F_2$Footnote 2, then $[\mu _1]_c \le [\mu _2]_c$.
(c) Internality: If $\mu (I)=1$ for an interval $I \subseteq \mathbb {R}_+,$ then $[\mu ]_c \in I$.
(d) Homogeneity: If $\lambda \ge 0,$ and $\lambda _* \mu$ denotes the pushforward of $\mu$ under the map $x \mapsto \lambda x,$ then $[\lambda _* \mu ]_c = \lambda [\mu ]_c$.
(e) Monotonicity with respect to the parameter: If $0 \le c' \le c \le 1,$ then $[\mu ]_{c'} \ge [\mu ]_{c}$.
Proof. The proofs use the properties of the HS kernel listed in Proposition 2.2. Reflexivity is obvious when $c=1$ or $x=0$, and in all other cases follows from property (e). Monotonicity with respect to the measure is a consequence of the fact that the HS kernel is increasing in $x$. The internality property of the HS barycenter follows from reflexivity and monotonicity. Homogeneity follows from property (g) of the HS kernel and the change of variables formula. Finally, monotonicity with respect to the parameter $c$ is a consequence of the corresponding property of the HS kernel.
As it will be clear later (see Example 2.13), the internality and the monotonicity properties (w.r.t. $\mu$ and w.r.t. $c$) are not strict.
2.2. Computation and critical phenomenon
In the remaining of this section, we discuss how to actually compute HS barycenters. In view of Proposition 2.5, we may focus on measures in ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$. The mass of zero plays an important role. Given $c \in (0,\,1)$ and $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, we use the following terminology, where $\mu (0) = \mu (\{0\})$:
The next result establishes a way to compute $[\mu ]_c$ in the subcritical case; the remaining cases will be dealt with later in Proposition 2.11.
Proposition 2.7 If $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}},$ $c \in (0,\, 1)$ and $\mu (0) < 1-c$ (subcritical case), then the equation
has a unique positive and finite solution $\eta = \eta (\mu,\,c),$ and the $\inf$ in formula (2.4) is attained uniquely at $y=\eta ;$ in particular,
Proof. Fix $\mu$ and $c$ as in the statement. We compute the partial derivative:
Since $\Delta$ is bounded, we are allowed to differentiate under the integral sign:
where $\psi (y) := \int \Delta (x,\,y) \, d\mu$. The partial derivative
is positive, except at $x=0$. Since $\mu \neq \delta _0$, the function $\psi$ is strictly increasing. Furthermore,
and so
using the assumption $\mu (0)<1-c$. Therefore, there exists a unique $\eta >0$ that solves the equation $\psi (\eta )=0$, or equivalently equation (2.9). By (2.12), the function $y \mapsto \int K(x,\,y,\,c) \, d\mu$ decreases on $(0,\,\eta ]$ and increases on $[\eta,\,+\infty )$, and so attains its infimum at $\eta$. Formula (2.10) follows from the definition of $[\mu ]_c$.
Let us note that, as a consequence of (2.11), equation (2.9) is equivalent to:
Remark 2.8 If $\mu$ belongs to $\mathcal {P}(\mathbb {R}_+ )$ but not to ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, and still $c \in (0,\,1)$, then equation (2.9) (or its equivalent version (2.16)) still has a unique positive and finite solution $\eta = \eta (\mu,\,c)$, and formula (2.10) still holds. On the other hand, if $c=0$ and $\int x \, d\mu (x) < \infty$, then all conclusions of Proposition 2.7 still hold, with a similar proof.
We introduce the following auxiliary function, plotted in Figure 2:
The following alternative formula for the HS barycenter matches the original one from [Reference Halász and Székely15], and in some situations is more convenient:
Proposition 2.9 If $0 < c\le 1$ and $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, then:
Furthermore, if $\mu (0)<1-c$, then the $\inf$ is attained at the unique positive finite solution $\rho = \rho (\mu,\,c)$ of the equation
Proof. The formula is obviously correct if $c=1$. If $0< c<1$, we introduce the variable $r := \frac {1-c}{c} \, y$ in formula (2.4) and manipulate. Similarly, (2.16) becomes (2.19).
If $\mu$ is a Borel probability measure on $\mathbb {R}_{+ }$ not entirely concentrated at zero (i.e., $\mu \neq \delta _0$) then we denote by $\mu ^{+ }$ the probability measure obtained by conditioning on the event $\mathbb {R}_{+ + } =(0,\,\infty )$, that is,
Obviously, if $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, then $\mu ^{+ } \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ as well.
Proposition 2.10 Let $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ and $p := 1-\mu (0)$. If $c \in (0,\,1]$ and $c \le p$ (critical or subcritical cases), then
Proof. Note that $\mu \neq \delta _0$, so the positive part $\mu ^{+ }$ is defined. Using formula (2.18), we have:
At this point, the assumption $c \le p$ guarantees that the barycenter $[\mu ^{+ }]_{c/p}$ is well defined, and using (2.18) again we obtain (2.21).
Finally, we compute the HS barycenter in the critical and supercritical cases:
Proposition 2.11 Let $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ and $c \in (0,\,1]$.
(a) Critical case: If $\mu (0)=1-c,$ then $[\mu ]_c = B(c) [\mu ^{+ }]_1$.
(b) Supercritical case: If $\mu (0)>1-c,$ then $[\mu ]_c = 0$.
In both cases above, the infimum in formula (2.4) is not attained.
Proof. In the critical case, we use (2.21) with $p=c$ and conclude.
In the supercritical case, we can assume that $\mu \neq \delta _0$. Note that $p< c$, thus, $\lim _{r \to 0^{+}} r^{1-\frac {p}{c}}=0$. Moreover, since $\log ^{+} x \in L^{1}(\mu )$, we have
Therefore, using (2.25), we obtain $[\mu ]_c = 0$.
Propositions 2.7 and 2.11 allow us to compute HS barycenters in all cases. For emphasis, let us list explicitly the situations where the barycenter vanishes:
Proposition 2.12 Let $c \in [0,\,1]$ and $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$. Then $[\mu ]_c = 0$ if and only if one of the following mutually exclusive situations occur:
(a) $c = 0$ and $\mu = \delta _0$.
(b) $c>0,$ $\mu (0) = 1-c$, and $\int \log x \, d\mu ^{+ }(x) = -\infty$.
(c) $c>0$ and $\mu (0) > 1-c$.
Proof. The case $c=0$ being obvious, assume that $c>0$, and so $B(c)>0$. In the critical case, part (a) of Proposition 2.11 tells us that $[\mu ]_c = 0$ if and only if $[\mu ^{+ }]_1 = 0$, which by (2.5) is equivalent to $\int \log x \, d\mu ^{+ } = -\infty$. In the supercritical case, part (b) of the proposition ensures that $[\mu ]_c = 0$.
Example 2.13 Consider the family of probability measures:
If $0< c<1$, then
These formulas were first obtained by Székely [Reference Székely26]. It follows that the function
(whose graph is shown on [Reference van Es28, p. 680]) is discontinuous at the points with $p=c>0$, and only at those points. We will return to the issue of continuity in § 4.
The practical computation of HS barycenters usually requires numerical methods. In any case, it is useful to notice that the function $\eta$ from Proposition 2.7 satisfies the internality property:
Lemma 2.14 Let $\mu \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, $c \in (0,\, 1)$, and suppose that $\mu (0) < 1-c$. If $\mu (I)=1$ for an interval $I \subseteq \mathbb {R}_+$, then $\eta (\mu,\,c) \in I$.
The proof is left to the reader.
3. Comparison with the symmetric means
3.1. HS means as repetitive symmetric means
The HS barycenter of a probability measure, introduced in the previous section, may now be specialized to the case of discrete equidistributed probabilities. So the HS mean of a tuple $\underline {x} = (x_1,\,\dots,\,x_n)$ of non-negative numbers with parameter $c \in [0,\,1]$ is defined as:
Using (2.18), we have more explicitly:
where $B$ is the function (2.17). On the other hand, recall that for $k \in \{1,\,\dots,\,n\}$, the $k$-th symmetric mean of the $n$-tuple $\underline {x}$ is:
where $E^{(n)}_k$ denotes the elementary symmetric polynomial of degree $k$ in $n$ variables.
Since they originate from a barycenter, the HS means are repetition invariant Footnote 3 in the sense that, for any $m>0$,
where $\underline {x}^{(m)}$ denotes the $nm$-tuple obtained by concatenation of $m$ copies of the $n$-tuple $\underline {x}$. No such property holds for the symmetric means, even allowing for adjustment of the parameter $k$. Nevertheless, if the number of repetitions tends to infinity, then the symmetric means tend to stabilize, and the limit is a HS mean; more precisely:
Theorem 3.1 If $\underline {x} = (x_1,\,\dots,\,x_n),$ $x_i \ge 0,$ and $1 \le k \le n,$ then:
Furthermore, the relative error goes to zero uniformly with respect to the $x_i$'s.
This theorem will be proved in the next subsection.
It is worthwhile to note that the Navas barycenter [Reference Navas22] is obtained as a ‘repetition limit’ similar to (3.5).
Example 3.2 Using Propositions 2.7 and 2.11, one computes:
Therefore,
The last equality was deduced in [Reference Cellarosi, Hensley, Miller and Wellens10, p. 31] from the asymptotics of Legendre polynomials.
Repeated symmetric means are particular instances of Whiteley means; in fact, in the notation of [Reference Bullen8, p. 344],
Let us pose a problem:
Question 3.3 Is the sequence $m \mapsto \mathsf {sym}_{km} ( \underline {x}^{(m)} )$ always monotone decreasing?
There exists a partial result: when $k=1$ and $n=2$, [Reference Cellarosi, Hensley, Miller and Wellens10, Lemma 4.1] establishes eventual monotonicity.
3.2. Inequalities between symmetric means and HS means
The following is the first main result of this paper.
Theorem 3.4 If $\underline {x} = (x_1,\,\dots,\,x_n)$, $x_i \ge 0$, and $1 \le k \le n$, then
Let us postpone the proof to the next subsection. The factor at the RHS of (3.9) is asymptotically $1$ with respect to $k$; indeed:
Lemma 3.5 For all integers $n \ge k \ge 1$, we have
with equality if and only if $k=n$.
Proof. Let $c := k/n$ and
By the Binomial Theorem, $b \le 1$, which yields the first part of (3.10). For the lower estimate, we use the following Stirling bounds (see [Reference Feller12, p. 54, (9.15)]), valid for all $n\ge 1$,
Then, a calculation (cf. [Reference Feller12, p. 184, (3.11)]) gives:
from which the second part of (3.10) follows.
Theorem 3.4 and Lemma 3.5 imply that HS means (with rational values of the parameter) can be obtained as repetition limits of symmetric means:
Proof of Theorem 3.1. Applying Theorem 3.4 to the tuple $\underline {x}^{(m)}$, using observation (3.4) and Lemma 3.5, we have:
Making $m\to \infty$ we obtain (3.5).
Remark 3.6 If $k$ is fixed, then
and therefore, the bound from Theorem 3.7 may be less satisfactory. But in this case, we may use the alternative bound coming from Maclaurin inequality:
3.3. Proof of Theorem 3.4
The two inequalities in (3.9) will be proved independently of each other. They are essentially contained in the papers [Reference Bochi, Iommi and Ponce5] (the first inequality) and [Reference Halász and Székely15] (the second one), though neither was stated explicitly. In the following Theorems 3.8 and 3.7, we also characterize the cases of equality, and in particular show that each inequality is sharp in the sense that the corresponding factors cannot be improved.
Let us begin with the second inequality, which is more elementary. By symmetry, there is no loss of generality in assuming that the numbers $x_i$ are ordered.
Theorem 3.7 If $\underline {x} = (x_1,\,\dots,\,x_n)$ with $x_1 \ge \cdots \ge x_n \ge 0,$ and $1 \le k \le n,$ then:
Furthermore, equality holds if and only if $x_{k+1} = 0$ or $k=n$.
Proof. Our starting point is Vieta's formula:
Therefore, by Cauchy's formula, for any $r>0$:
That is,
Taking absolute values,
But these inequalities are valid for all $r>0$, and therefore:
So formulas (3.2) and (3.3) imply inequality (3.17).
Now let us investigate the possibility of equality. We consider three mutually exclusive cases, which correspond to the classification (2.8):
Using Proposition 2.11, in the critical case, we have:
while in the supercritical case, the two means vanish together. So, in both cases, inequality (3.17) becomes an equality. Now suppose we are in the subcritical case; then the $\inf$ at the RHS of (3.22) is attained at some $r > 0$: see Proposition 2.9. On the other hand, for this (and actually any) value of $r$, the second inequality in (3.21) must be strict, because the integrand is non-constant. We conclude that, in the subcritical case, inequality (3.22) is strict, and therefore, (3.17) is strict.
The first inequality in (3.9) is a particular case of an inequality between two types of matrix means introduced in [Reference Bochi, Iommi and Ponce5], which we now explain. Let $A = (a_{i,j})_{i,j\in \{1,\dots,n\}}$ be a $n \times n$ matrix with non-negative entries. Recall that the permanent of $A$ is the ‘signless determinant’
where $\sigma$ runs on the permutations of $\{1,\, \dots,\, n\}$. Then the permanental mean of $A$ is defined as:
On the other hand, the scaling mean of the matrix $A$ is defined as:
where $u$ and $v$ run on the set of strictly positive column vectors, and $\mathsf {gm}(\mathord {\cdot })$ denotes the geometric mean of the entries of the vector. Equivalently,
see [Reference Bochi, Iommi and Ponce5, Remark 2.6].Footnote 4 By [Reference Bochi, Iommi and Ponce5, Theorem 2.17],
with equality if and only if $A$ has permanent $0$ or rank $1$. This inequality is far from trivial. Indeed, if the matrix $A$ is doubly stochastic (i.e. row and column sums are all $1$), then an easy calculation (see [Reference Bochi, Iommi and Ponce5, Prop. 2.4]) shows that $\mathsf {sm}(A) = \frac {1}{n}$, so (3.29) becomes $\mathsf {pm}(A) \ge \frac {1}{n}$, or equivalently,
This lower bound on the permanent of doubly stochastic matrices was conjectured in 1926 by van der Waerden and, after a protracted series of partial results, proved around 1980 independently by Egorichev and Falikman: see [Reference Zhan30, Chapter 5] for the exact references and a self-contained proof, and [Reference Gurvits14] for more recent developments. Our inequality (3.29), despite being a generalization of Egorichev–Falikman's (3.30), is actually a relatively simple corollary of it: we refer the reader to [Reference Bochi, Iommi and Ponce5, § 2] for more information.Footnote 5
We are now in position to complete the proof of Theorem 3.4, i.e., to prove the second inequality in (3.9). The next result also characterizes the cases of equality.
Theorem 3.8 If $\underline {x} = (x_1,\,\dots,\,x_n)$ with $x_1 \ge \cdots \ge x_n \ge 0,$ and $1 \le k \le n,$ then:
Furthermore, equality holds if and only if :
Proof. Consider the non-negative $n \times n$ matrix:
Note that:
and so
Now let's compute the scaling mean of $A$ using formula (3.28). Assume that $k< n$. Given a column vector $v = \left (\begin {smallmatrix} v_1 \\ \vdots \\ v_n \end {smallmatrix}\right )$ with positive entries, we have:
On the other hand, by the inequality of arithmetic and geometric means,
with equality if $v_1 = \cdots = v_k = \frac {s}{k}$, $v_{k+1} = \cdots = v_n = \frac {r}{n-k}$. So, in order to minimize the quotient $\frac {\mathsf {gm}(Av)}{\mathsf {gm}(v)}$, it is sufficient to consider column vectors $v$ satisfying these conditions. We can also normalize $s$ to $1$, and (3.28) becomes:
by (3.2). This formula $\mathsf {sm}(A) = [\mathsf {hsm}_{k/n}(\underline {x})]^{k/n}$ also holds for $k=n$, taking the form $\mathsf {sm}(A) = (x_1 \cdots x_n)^{\frac {1}{n}}$; this can be checked either by adapting the proof above, or more simply by using the homogeneity and reflexivity properties of the scaling mean (see [Reference Bochi, Iommi and Ponce5]).
In conclusion, the matrix (3.33) has scaling and permanental means given by formulas (3.40) and (3.36), respectively, and the fundamental inequality (3.29) translates into $\mathsf {hsm}_{k/n}(\underline {x}) \le \mathsf {sym}_k(\underline {x})$, that is, (3.8).
Furthermore, equality holds if and only if the matrix $A$ defined by (3.33) satisfies $\mathsf {sm}(A) = \mathsf {pm}(A)$, by formulas (3.40) and (3.36). As mentioned before, $\mathsf {sm}(A) = \mathsf {pm}(A)$ if and only if $A$ has rank $1$ or permanent $0$ (see [Reference Bochi, Iommi and Ponce5, Theorem 2.17]). Note that $A$ has rank $1$ if and only if $k=1$ or $x_1=\dots =x_n$. On the other hand, by (3.36), $A$ has permanent $0$ if and only if $\mathsf {sym}_k(\underline {x})=0$, or equivalently $x_k=0$. So we have proved that equality $\mathsf {hsm}_{k/n}(\underline {x}) = \mathsf {sym}_k(\underline {x})$ is equivalent to condition (3.32).
We close this section with some comments on related results.
Remark 3.9 In [Reference Halász and Székely15], the asymptotics of the integral (3.19) are determined using the saddle point method (see e.g. [Reference Simon24, Section 15.4]). However, for this method to work, the saddle must be steep, that is, the second derivative at the saddle must be large in absolute value. Major [Reference Major21, p. 1987] discusses this situation: if the second derivative vanishes, then ‘a more sophisticated method has to be applied and only weaker results can be obtained in this case. We shall not discuss this question in the present paper’. On the other hand, in the general situation covered by our Theorem 3.4, the saddle can be flat. (It must be noted that the setting considered by Major is different, since he allows random variables to be negative.)
Remark 3.10 Given an arbitrary $n \times n$ non-negative matrix $A$, the permanental and scaling means satisfy the following inequalities (see [Reference Bochi, Iommi and Ponce5, Theorem 2.17]),
The sequence $(n(n!)^{-1/n})$ is increasing and converges to $e$. In general, as $n$ tends to infinity the permanental mean does not necessarily converge to the scaling mean. However, there are some special classes of matrices for which this is indeed the case: for example, in the repetitive situation covered by the generalized Friedland limit [Reference Bochi, Iommi and Ponce5, Theorem 2.19]. Note that $\mathsf {hsm}_{k/n}(\underline {x})$ and $\mathsf {sym}_k(\underline {x})$ correspond to the $n/k-$th power of the scaling and permanental mean of the matrix $A$, respectively. Therefore, (3.9) can be regarded as an improvement of (3.41) for this particular class of matrices.
Remark 3.11 A natural extension of symmetric means is Muirhead means, see [Reference Hardy, Littlewood and Pólya16, § 2.18], [Reference Bullen8, § V.6] for definition and properties. Accordingly, it should be possible to define a family of barycenters extending the HS barycenters, taking over from [Reference Bochi, Iommi and Ponce5, § 5.2]. An analogue of inequality (3.31) holds in this extended setting, again as a consequence of the key inequality (3.29) between matrix means. However, we do not know if inequality (3.17) can be extended in a comparable level of generality.
4. Continuity of the HS barycenter
In this section, we study the continuity of the HS barycenter as a two-variable function, $(\mu,\,c) \mapsto [\mu ]_c$, defined in the space $\mathcal {P}(\mathbb {R}_+ ) \times [0,\,1]$. The most natural topology on $\mathcal {P}(\mathbb {R}_+ )$ is the weak topology (defined below). The barycenter function is not continuous with respect to this topology, but, on the positive side, it is lower semicontinuous, except in a particular situation. In order to obtain better results, we need to focus on subsets of measures satisfying the natural integrability conditions (usually (2.6), but differently for the extremal parameters $c=0$ and $c=1$), and endow these subsets with stronger topologies that are well adapted to the integrability assumptions.
In a preliminary subsection, we collect some general facts on topologies on spaces of measures. In the remaining subsections, we prove several results on continuity of the HS barycenter. And all these results will be used in combination to prove our general ergodic theorem in § 5.
4.1. Convergence of measures
If $(X,\, \mathrm {d})$ is a separable complete metric space, let $C_\mathrm {b}(X)$ be the set of all continuous bounded real functions on $X$, and let $\mathcal {P}(X)$ denote the set of all Borel probability measures on $X$. Recall (see e.g. [Reference Parthasarathy23]) that the weak topology is a metrizable topology on $\mathcal {P}(X)$ according to with a sequence $(\mu _n)$ converges to some $\mu$ if and only if $\int \phi \, d \mu _n \to \int \phi \, d \mu$ for every test function $\phi \in C_\mathrm {b}(X)$; we say that $(\mu _n)$ converges weakly to $\mu$, and denote this by $\mu _n \rightharpoonup \mu$. The space $\mathcal {P}(X)$ is Polish, and it is compact if and only if $X$ is compact. Despite the space $C_\mathrm {b}(X)$ being huge (nonseparable w.r.t. its usual topology if $X$ is noncompact), by [Reference Parthasarathy23, Theorem II.6.6] we can nevertheless find a countable subset $\mathcal {C} \subseteq C_\mathrm {b}(X)$ such that, for all $(\mu _n)$ and $\mu$ in $\mathcal {P}(X)$,
The following result deals with sequences of integrals $\int \phi _n \, d\mu _n$ where not only the measures but also the integrands vary, and bears a resemblance to Fatou's Lemma:
Proposition 4.1 Suppose that $(\mu _n)$ is a sequence in $\mathcal {P}(X)$ converging weakly to some measure $\mu,$ and that $(\phi _n)$ is a sequence of continuous functions on $X$ converging uniformly on compact subsets to some function $\phi$. Furthermore, assume that the functions $\phi _n$ are bounded from below by a constant $-C$ independent of $n$. Then, $\liminf _{n \to \infty } \int \phi _n \, d\mu _n \geq \int \phi \, d\mu$.
Note that, as in Fatou's Lemma, the integrals in Proposition 4.1 can be infinite.
Proof. Without loss of generality, assume that $C=0$. Let $\lambda \in \mathbb {R}$ be such that $\lambda < \int \phi \, d\mu$. By the monotone convergence theorem, there exists $m \in \mathbb {N}$ such that $\int \min \{ \phi,\, m\} \, d\mu > \lambda$. For a function $\psi :X \to \mathbb {R}$, let $\hat {\psi }(x):=\min \{ \psi (x),\, m\}$. Note that
By Prokhorov's theorem (see e.g. [Reference Parthasarathy23, Theorem 6.7]), the sequence $(\mu _n)$ forms a tight set, that is, for every $\epsilon > 0$, there exists a compact set $K \subseteq X$ such that $\mu _n(X \setminus K) \le \epsilon /(2\,m)$ for all $n$. Since $(\phi _n)$ converges uniformly on compact subsets to $\phi$, we obtain:
We also have:
Since $(\mu _n)$ converges weakly to $\mu$, we have $\lim _{n \to \infty } \int \hat {\phi } \, d\mu _n =\int \hat {\phi } \, d\mu > \lambda$. Therefore, combining (4.2), (4.3) and (4.4), for sufficiently large values of $n$, we obtain $\int \phi _n \, d\mu _n > \lambda$. The result now follows.
The next direct consequence is useful.
Corollary 4.2 Suppose that $(\mu _n)$ is a sequence in $\mathcal {P}(X)$ converging weakly to some measure $\mu,$ and that $(\phi _n)$ is a sequence of continuous functions on $X$ converging uniformly on compact subsets to some function $\phi$. Furthermore, assume that the functions $|\phi _n|$ are bounded by a constant $C$ independent of $n$. Then, $\int \phi _n \, d\mu _n \to \int \phi \, d\mu$.
We will also need a slightly stronger notion of convergence. Let $\mathcal {P}_1(X) \subseteq \mathcal {P}(X)$ denote the set of measures $\mu$ with finite first moment, that is,
Here and in what follows, $x_0 \in X$ is a basepoint which we consider as fixed, the particular choice being entirely irrelevant. We metrize $\mathcal {P}_1(X)$ with the Kantorovich metric (see e.g. [Reference Villani29, p. 207]):
Of course, the Kantorovich metric depends on the original metric $\mathrm {d}$ on $X$; in fact, it ‘remembers’ it, since $W_1(\delta _x ,\, \delta _y) = \mathrm {d}(x,\,y)$. The metric space $(\mathcal {P}_1(X),\, W_1)$ is called $1$-Wasserstein space; it is separable and complete. Unless $X$ is compact, the topology on $\mathcal {P}_1(X)$ is stronger than the weak topology. In fact, we have the following characterizations of convergence:
Theorem 4.3 Villani [Reference Villani29, Theorem 7.12]
For all $(\mu _n)$ and $\mu$ in $\mathcal {P}_1(X),$ the following statements are equivalent:
(a) $W_1(\mu _n,\, \mu ) \to 0$.
(b) if $\phi \colon X \to \mathbb {R}$ is a continuous function such that $\frac {|\phi |}{1+\mathrm {d}(\mathord {\cdot },\,x_0)}$ is bounded, then $\int \phi \, d\mu _n \to \int \phi \, d\mu$.
(c) $\mu _n \rightharpoonup \mu$ and $\int \mathrm {d}(\mathord {\cdot },\,x_0) \, d\mu _n \to \int \mathrm {d}(\mathord {\cdot },\,x_0) \, d\mu$.
(d) $\mu _n \rightharpoonup \mu$ and the following ‘tightness condition’ holds:
(4.7)\begin{equation} \lim_{R \to \infty} \limsup_{n \to \infty} \displaystyle\int_{X \setminus B_R(x_0)} [1+\mathrm{d}(\mathord{\cdot},x_0)]\, d\mu_n = 0 . \end{equation}where $B_R(x_0)$ denotes the open ball of centre $x_0$ and radius $R$.
The next lemma should be compared to Corollary 4.2:
Lemma 4.4 Suppose that $(\mu _n)$ is a sequence in $\mathcal {P}_1(X)$ converging to some measure $\mu,$ and that $(\phi _n)$ is a sequence of continuous functions on $X$ converging uniformly on bounded subsets to some function $\phi$. Furthermore, assume that the functions $\frac {|\phi _n|}{1+\mathrm {d}(\mathord {\cdot },\,x_0)}$ are bounded by a constant $C$ independent of $n$. Then, $\int \phi _n \, d\mu _n \to \int \phi \, d\mu$.
Proof. Fix $\epsilon >0$. By part (d) of Theorem 4.3, there exists $R>0$ such that, for all sufficiently large $n$,
Then, we write:
By part (b) of Theorem 4.3, the term ${\unicode{x2462}}$ tends to $0$ as $n \to \infty$. By the assumption of uniform convergence on bounded sets, $\left|{\unicode{x2461}}\right| \le \sup _{B_{2R}(x_0)} |\phi _n - \phi |$ tends to $0$ as well. Finally,
for all sufficiently large $n$. Since $\epsilon >0$ is arbitrary, we conclude that $\int \phi _n \, d\mu _n \to \int \phi \, d\mu$, as claimed.
Let us now consider the specific case of the metric space $(X,\,\mathrm {d}) = (\mathbb {R}_+,\, \mathrm {d}_\mathrm {HS})$, where:
Then the finite-first-moment condition (4.5) becomes our usual integrability condition (2.6), so the $1$-Wasserstein space $\mathcal {P}_1(X)$ becomes ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$.
4.2. Lower and upper semicontinuity
The HS barycenter is definitely not continuous with respect to the weak topology, since the complement of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ is dense in $\mathcal {P}(\mathbb {R}_+ )$, and the barycenter is $\infty$ there (by Proposition 2.5). Nevertheless, lower semicontinuity holds, except in the critical configuration:
Theorem 4.5 For every $(\mu,\,c) \in \mathcal {P}(\mathbb {R}_+ ) \times [0,\,1),$ we have:
To be explicit, the inequality above means that for every $\lambda < [\mu ]_c$, there exists a neighborhood $\mathcal {U} \subseteq \mathcal {P}(\mathbb {R}_+ ) \times [0,\,1)$ of $(\mu,\,c)$ with respect to the product topology (weak $\times$ standard) such that $[\tilde \mu ]_{\tilde c} > \lambda$ for all $(\tilde \mu,\, \tilde c) \in \mathcal {U}$.
Proof. Let us first note that:
Indeed, if $c_n := c +1/n$, then $(\mu,\, c_n) \to (\mu,\, c)$. By Proposition 2.11, we have that $[\mu ]_{c_n}=0$. Thus,
We now consider the converse implication: given $(\mu,\,c) \in \mathcal {P}(\mathbb {R}_+ ) \times [0,\,1)$ such that $\mu (0) \neq 1-c$ or $[\mu ]_c = 0$, we want to show that $\liminf _{(\tilde \mu, \tilde c) \to (\mu,c)} [\tilde \mu ]_{\tilde c} \ge [\mu ]_c$. There are some trivial cases:
• If $[\mu ]_c=0$, then the conclusion is obvious.
• If $\mu (0)> 1-c$, then $[\mu ]_c=0$ by Proposition 2.11, and again the conclusion is clear.
In what follows, we assume that $\mu (0)<1-c$ and $[\mu ]_c>0$. Fix a sequence $(\mu _n,\,c_n)$ converging to $(\mu,\,c)$. We need to prove that:
We may also assume without loss of generality that $[\mu _n]_{c_n}<\infty$ for each $n$. We divide the proof in two cases:
Case $c>0$: We can also assume that $c_n >0$ for every $n$. By Proposition 2.5, the hypothesis $[\mu _n]_{c_n}<\infty$ means that $\mu _n \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$. By Portmanteau's Theorem [Reference Parthasarathy23, Theorem 6.1(c)], $\limsup _{n \to \infty } \mu _n(0) \leq \mu (0) < 1-c$. Thus, for sufficiently large values of $n$ we have $\mu _n(0) < 1-c_n$. In this setting, the HS barycenter $[\mu _n]_{c_n}$ can be computed by Proposition 2.7. Recall from (2.16) that $\eta (\tilde {\mu },\,\tilde {c})$ denotes the unique positive solution of the equation $\int \frac {x}{\tilde {c}x + (1-\tilde {c}) \eta } \, d\tilde {\mu } = 1$. Note that $\eta (\mu,\,c)$ is well defined even in the case $[\mu ]_{c}= \infty$ (see Proposition 2.8). We claim that:
Proof Proof of the claim
Fix numbers $\alpha _0$, $\alpha _1$ with $0< \alpha _0 < \eta (\mu,\,c) < \alpha _1$. Then a monotonicity property shown in the proof of Proposition 2.7 gives:
Note the uniform bounds:
So, using Corollary 4.2, we see that $\int \frac {x}{c_n x + (1-c_n) \alpha _i} \, d\mu _n \to \int \frac {x}{cx + (1-c) \alpha _i} \, d\mu$. In particular, for all sufficiently large $n$,
and thus, $\alpha _0 < \eta (\mu _n,\,c_n) < \alpha _1$, proving the claim (4.16).
For simplicity, write $y_n := \eta (\mu _n,\, c_n)$. By Proposition 2.2.(b),
for some finite $C$, since $\sup _n c_n < 1$ and $\inf _n y_n > 0$. This allows us to apply Proposition 4.1 and obtain:
where $y_\infty := \lim _{n\to \infty } y_n = \eta (\mu,\,c)$. Using formula (2.10), we obtain (4.15). This completes the proof of Theorem 4.5 in the case $c>0$.
Case $c=0$: By Proposition 4.1 we obtain, $\liminf \int x \, d\mu _n \ge \liminf \int x \, d\mu$, that is, $\liminf [\mu _n]_0 \ge [\mu ]_c$. So we can assume that $c_n >0$ for every $n$, like in the previous case.
In order to prove (4.15) in the case $c=0$, let us fix an arbitrary positive $\lambda < [\mu ]_0 = \int x \, d \mu$, and let us show that $[\mu _n]_{c_n} > \lambda$ for every sufficiently large $n$. By the monotone convergence theorem, there exists $m \in \mathbb {N}$ such that $\int \min (x,\,m) \, d\mu (x) > \lambda$. Let $\hat {\mu }$ (respectively $\hat {\mu }_n$) be the pushforward of the measure $\hat {\mu }$ (respectively $\mu _n$) by the map $x \mapsto \min (x,\,m)$. Then $[\hat {\mu }]_0 > \lambda$ and, by Proposition 2.6.(c), $[\hat {\mu }_n]_{c_n} \le [\mu _n]_{c_n}$. Furthermore, we have $\hat \mu _n \rightharpoonup \hat \mu$, since for every $f \in C_\mathrm {b}(\mathbb {R}_+ )$,
So, to simplify the notation, we remove the hats and assume that the measures $\mu _n$, $\mu$ are all supported in the interval $[0,\,m]$.
The numbers $\eta (\mu _n,\,c_n)$ and $\eta (\mu,\,c)$ are well defined, as in the previous case, and furthermore they belong to the interval $[0,\,m]$, by Lemma 2.14. On the other hand, by Proposition 4.1,
It follows that $\eta (\mu _n,\,c_n) > \lambda$ for all sufficiently large $n$. We claim that $\eta (\mu _n,\,c_n) \to \eta (\mu,\,0)$, as before in (4.16). The proof is the same, except that the upper bound in (4.18) becomes infinite and must be replaced by the following estimate:
So a repetition of the previous arguments yields (4.16), then (4.20) and (4.21), and finally (4.15). Therefore, Theorem 4.5 has been proved in both cases $c>0$ and $c=0$.
Next, let us investigate the behaviour of the HS barycenter on the product space ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$, where ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ is endowed with the topology defined in the end of § 4.1.
Theorem 4.6 For every $(\mu,\,c) \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$, we have:
Proof of part (4.25) of Proposition 4.6. Let us start by proving the following implication:
Consider measures $\mu _n := \frac {n-1}{n} \, \mu + \frac {1}{n} \, \delta _n$. Clearly, $\mu _n \rightharpoonup \mu$; moreover,
So using characterization (c) of Theorem 4.3, we conclude that $\mu _n \to \mu$ in the topology of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$. On the other hand, $[\mu _n]_0 = \frac {n-1}{n} \, [\mu ]_0 + 1 \to [\mu ]_0 + 1$. This proves (4.27).
Next, let us prove the converse implication. So, let us fix $(\mu,\,c)$ such that $c\neq 0$ or $[\mu ]_0 = \infty$, and let us show that if $(\mu _n,\, c_n)$ is any sequence in ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$ converging to $(\mu,\,c)$, then $\limsup _{n \to \infty } [\mu _n]_{c_n} \le [\mu ]_c$. This is obviously true if $[\mu ]_c = \infty$, so let us assume that $[\mu ]_c < \infty$. Then our assumption becomes $c>0$, so by removing finitely many terms from the sequence $(\mu _n,\, c_n)$, we may assume that $\inf _n c_n > 0$. Fix some finite number $\lambda > [\mu ]_c$. By Definition 2.3, there is some $y_0>0$ such that $\int K(x,\,y_0,\,c) \, d\mu (x) < \log \lambda$. The sequence of continuous functions $\frac {|K(x,\,y_0,\,c_n)|}{1+\log (1+x)}$ is uniformly bounded, as a direct calculation shows. Furthermore, $K(\mathord {\cdot },\,y_0,\, c_n) \to K(\mathord {\cdot },\,y_0,\,c)$ uniformly on compact subsets of $\mathbb {R}_+$. So Lemma 4.4 ensures that $\int K(x,\,y_0,\,c_n) \, d\mu _n(x) \to \int K(x,\,y_0,\,c) \, d\mu (x)$. Now it follows from Definition 2.3 that $[\mu _n]_{c_n} < \lambda$ for all sufficiently large $n$. Since $\lambda >[\mu ]_c$ is arbitrary, we conclude that $\limsup _{n \to \infty } [\mu _n]_{c_n} \le [\mu ]_c$, as we wanted to show.
Proof of part (4.26) of Proposition 4.6. First, we prove that, for all $(\mu,\,c) \in {{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$,
Consider measures $\mu _n := \frac {n-1}{n} \, \mu + \frac {1}{n} \, \delta _0$. Clearly, $\mu _n \to \mu$ in the topology of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$. By Proposition 2.11.(b), $[\mu _n]_c=0$. This proves (4.29). For $c \in [0,\,1)$ the converse is a direct consequence of Theorem 4.5, since the topology on ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ is stronger. If $c=1$ and $[\mu ]_1=0$ then the result is obvious. If $[\mu ]_1>0$ then, as the example above shows, the result does not hold.
4.3. Continuity for extremal values of the parameter
Theorem 4.6 shows that the HS barycenter map on ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$ is not continuous at the pairs $(\mu,\,0)$ (except if $[\mu ]_0 = \infty$), nor at the pairs $(\mu,\,1)$ (except if $[\mu ]_1 = 0$). Let us see that continuity can be ‘recovered’ if we impose extra integrability conditions and work with stronger topologies.
If we use the standard distance $\mathrm {d}(x,\,y) := |x-y|$ on the half-line $X = \mathbb {R}_+ = [0,\,+\infty )$, then the resulting $1$-Wasserstein space is denoted $\mathcal {P}_1(\mathbb {R}_+ )$. On the other hand, using the distance
the corresponding $1$-Wasserstein space will be denoted $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$. We consider the latter space as a subset of $\mathcal {P}(\mathbb {R}_+ )$, since any measure $\mu$ on $\mathbb {R}_{+ + }$ can be extended to $\mathbb {R}_+$ by setting $\mu (0) = 0$. The topologies we have just defined on the spaces $\mathcal {P}_1(\mathbb {R}_+ )$ and $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$ are stronger than the topologies induced by ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$; in other words, the inclusion maps below are continuous:
Note that the ‘arithmetic barycenter’ $[\mathord {\cdot }]_0$ is finite on $\mathcal {P}_1(\mathbb {R}_+ )$, while the ‘geometric barycenter’ $[\mathord {\cdot }]_1$ is finite and non-zero on $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$.
Finally, let us establish continuity of the HS barycenter for the extremal values of the parameter with respect to these new topologies:
Proposition 4.7 Consider a sequence $(\mu _n,\,c_n)$ in ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}} \times [0,\,1]$.
(a) If $(\mu _n,\, c_n) \to (\mu,\,0)$ in $\mathcal {P}_1(\mathbb {R}_+ ) \times [0,\,1),$ then $[\mu _n]_{c_n} \to [\mu ]_0$.
(b) If $(\mu _n,\, c_n) \to (\mu,\,1)$ in $\mathcal {P}_\mathrm {g}(\mathbb {R}_+ ) \times (0,\,1],$ then $[\mu _n]_{c_n} \to [\mu ]_1$.
Proof of part (a) of Proposition 4.7. Note that if $c_n=0$ for every $n \in \mathbb {N}$, the result is direct from the definition of the topology in $\mathcal {P}_1(\mathbb {R}_+ )$ (use characterization (c) of Theorem 4.3). We assume now that $c_n>0$, for every $n \in \mathbb {N}$. It is a consequence of Theorem 4.5 that:
The same proof as that of part (4.25) of Proposition 4.6 can be used to prove,
Indeed, in the topology of $\mathcal {P}_1(\mathbb {R}_+ )$ the HS kernels $K(x,\,y,\,c_n)$ satisfy the assumptions of Lemma 4.4. For this, it suffices to notice that, for any fixed value $y_0>0$, the sequence of continuous functions $\frac {|K(x,\,y_0,\,c_n)|}{1+x}$ is uniformly bounded, and that $K(x,\,y_0,\,c_n) \to K(x,\,y_0,\,0)$ uniformly on compact subsets of $\mathbb {R}_+$.
Proof of part (b) of Proposition 4.7 In the case that $c_n=1$ for every $n \in \mathbb {N}$, the result is direct from the topology in $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$ (use characterization (c) of Theorem 4.3). We assume that $c_n <1$, for every $n$. It is a consequence of (4.25) of Proposition 4.6 that:
Recall that the HS barycenter is decreasing in the variable $c$; see Proposition 2.6.(e). In particular, $[\mu _n]_{c_n} \geq [\mu _n]_1$, for every $n \in \mathbb {N}$. Noticing that $\log x$ is a test function for the convergence in the topology of $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$, we obtain:
The following observation complements part (b) of Proposition 4.7, since it provides a sort of lower semicontinuity property at $c=1$ under a weaker integrability condition:
Lemma 4.8 Let $c \in [0,\,1)$ and let $\mu \in \mathcal {P}(\mathbb {R}_+ )$ be such that $\log ^{-}(x) \in L^{1}(\mu )$. Then:
Proof. By definition, $[\mu ]_c \ge \int K(x,\,1,\,c) \, d\mu (x)$. Note that if $x \ge 1$, then $K(x,\,1,\,c) = c^{-1} \log (cx+1-c) \ge c^{-1} \log x$, while if $x<1$, then $K(x,\,1,\,c) \ge \log x$ by Proposition 2.2.(e). In any case, $K(x,\,1,\,c) \ge c^{-1} \log ^{+}(x) - \log ^{-}(x)$, and the Lemma follows.
5. Ergodic theorems for symmetric and HS means
Symmetric means (3.3) are only defined for lists of non-negative numbers. On the other hand, HS barycenters are defined for probability measures on $\mathbb {R}_+$, and are therefore much more flexible objects. In particular, there is an induced concept of HS mean of a list of non-negative numbers, which we have already introduced in (3.1). We can also define the HS mean of a function:
Definition 5.1 If $(\Omega,\, \mathcal {F},\, \mathbb {P})$ is a probability space, $f \colon \Omega \to \mathbb {R}_+$ is a measurable non-negative function, and $c \in [0,\,1]$, then the Halász–Székely mean (or HS mean) with parameter $c$ of the function $f$ with respect to the probability measure $\mathbb {P}$ is:
that is, the HS barycenter with parameter $c$ of the pushforward measure on $\mathbb {R}_+$. In the case $c=1$, we require that $\log f$ is semi-integrable.
For arithmetic means, the classical ergodic theorem of Birkhoff states the equality between limit time averages and spatial averages. From the probabilistic viewpoint, Birkhoff's theorem is the strong law of large numbers. We prove an ergodic theorem that applies simultaneously to symmetric and HS means and extends the results of [Reference Halász and Székely15, Reference van Es28]:
Theorem 5.2 Let $(\Omega,\, \mathcal {F},\, \mathbb {P})$ be a probability space, let $T \colon \Omega \to \Omega$ be an ergodic measure-preserving transformation, and let $f \colon \Omega \to \mathbb {R}_+$ be a non-negative measurable function. Then there exists a measurable set $R \subseteq \Omega$ with $\mathbb {P}(R)=1$ with the following properties. For any $\omega \in R,$ for any $c \in [0,\,1]$ such that
and
and for any sequence $(c_n)$ in $[0,\,1]$ tending to $c,$ we have:
furthermore, for any sequence $(k_n)$ of integers such that $1 \le k_n \le n$ and $k_n/n \to c,$ we have:
Remark 5.3 Since we allow HS means to take infinity value, we do not need integrability conditions as in [Reference Halász and Székely15, Reference van Es28], except for the unavoidable hypothesis (5.3). In the supercritical case $\mathbb {P}(f^{-1}(0)) > 1-c$, both limits (5.4) and (5.5) are almost surely attained in finite time. In the critical case $\mathbb {P}(f^{-1}(0)) = 1-c$, strong convergence does not necessarily hold, and the values $\mathsf {sym}_{k_n} ( f(\omega ),\, \dots,\, f(T^{n-1} \omega ) )$ may oscillate. However, in the IID setting, van Es proved that the sequence of symmetric means converges in distribution, provided that the sequence $(\sqrt {n} (k_n/n -c))$ converges in $[-\infty,\, \infty ]$: see [Reference van Es28, Theorem A1 (b)].
As we will soon see, part (5.4) of Theorem 5.2 is obtained using the results about continuity of the HS barycenter with respect to various topologies proved in § 4, and then part (5.5) follows from the inequalities of Theorem 3.4 and Remark 3.6.
To begin the proof, let us fix $(\Omega,\, \mathcal {F},\, \mathbb {P})$, $T$, and $f$ as in the statement, and let $\mu := f_* \mathbb {P} \in \mathcal {P}(\mathbb {R}_+ )$ denote the pushforward measure. Given $\omega \in \Omega$, we consider the sequence of associated sample measures:
As the next result shows, these sample measures converge almost surelyFootnote 6 :
Lemma 5.4 There exists a measurable set $R \subseteq \Omega$ with $\mathbb {P}(R)=1$ such that for every $\omega \in R,$ the corresponding sample measures converge weakly to $\mu :$
furthermore, stronger convergences may hold according to the function $f:$
(a) if $\log (1+f) \in L^{1}(\mathbb {P}),$ then $\mu _n^{\omega } \to \mu$ in the topology of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}};$
(b) if $f \in L^{1}(\mathbb {P}),$ then $\mu _n^{\omega } \to \mu$ in the topology of $\mathcal {P}_1(\mathbb {R}_+ );$
(c) if $|\log f| \in L^{1}(\mathbb {P}),$ then $\mu _n^{\omega } \to \mu$ in the topology of $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$.
Proof. Let $\mathcal {C} \subset C_\mathrm {b}(\mathbb {R}_+ )$ be a countable set of bounded continuous functions which is sufficient to test weak convergence, i.e., with property (4.1). For each $\phi \in \mathcal {C}$, applying Birkhoff's ergodic theorem to the function $\phi \circ f$, we obtain a measurable set $R \subseteq \Omega$ with $\mathbb {P}(R)=1$ such that for all $\omega \in R$,
Since $\mathcal {C}$ is countable, we can choose a single measurable set $R$ of full probability that works for all $\phi \in \mathcal {C}$. Then we obtain $\mu _n^{\omega } \rightharpoonup \mu$ for all $\omega \in R$. To obtain stronger convergences, we apply Birkhoff's theorem to the functions $\log (1+f)$, $f$, and $|\log f|$, provided they are integrable, and reduce the set $R$ accordingly. If, for example, $f$ is integrable, then for all $\omega \in R$ we have:
Applying part (c) of Theorem 4.3 with $x_0 = 0$ and $\mathrm {d}(x,\,x_0) = x$, we conclude that $\mu _n^{\omega }$ converges to $\mu$ in the topology of $\mathcal {P}_1(\mathbb {R}_+ )$. The assertions about convergence in ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$ and $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$ are proved analogously, using instead the corresponding distances (4.11) and (4.30).
Proof of Theorem 5.2. Let $R$ be the set given by Lemma 5.4. By the semi-integrable version of Birkhoff's theorem (see e.g. [Reference Krengel19, p. 15]), we can reduce $R$ if necessary and assume that for all $\omega \in R$,
Fix a point $\omega \in R$ and a number $c \in [0,\,1]$ satisfying conditions (5.2) and (5.3). Consider any sequence $(c_n)$ in $[0,\,1]$ converging to $c$. Let us prove (5.5), or equivalently,
There are several cases to be considered, and in all but the last case we will use Lemma 5.4:
• First case: $0 \le c < 1$ and $[\mu ]_c = \infty$. Since $\mu _n^{\omega } \rightharpoonup \mu$, (5.11) is a consequence of Theorem 4.5.
• Second case: $0< c<1$ and $[\mu ]_c < \infty$. Then $\log (1+f) \in L^{1}(\mathbb {P})$. Therefore, $\mu _n^{\omega } \to \mu$ in the topology of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, and Proposition 4.6 implies (5.11).
• Third case: $c=0$ and $[\mu ]_0 < \infty$. Then $\log f \in L^{1}(\mathbb {P})$, and hence $\mu _n^{\omega } \to \mu$ in the topology of $\mathcal {P}_1(\mathbb {R}_+ )$. So (5.11) follows from Proposition 4.7.(a).
• Fourth case: $c=1$ and $[\mu ]_1 =0$. Then $\log (1+f) \in L^{1}(\mathbb {P})$. Thus, $\mu _n^{\omega } \to \mu$ in the topology of ${{\mathcal {P}_\mathrm {HS}(\mathbb {R}_{+ })}}$, and Proposition 4.6 yields (5.11).
• Fifth case: $c=1$ and $0< [\mu ]_1 < \infty$. Then $|\log f| \in L^{1}(\mathbb {P})$. Therefore, $\mu _n^{\omega } \to \mu$ in the topology of $\mathcal {P}_\mathrm {g}(\mathbb {R}_{+ + })$, and (5.11) becomes a consequence of Proposition 4.7.(b).
• Sixth case: $c=1$ and $[\mu ]_1 = \infty$. Then $\log ^{-}(f)$ is integrable, but $\log ^{+}(f)$ is not. If $n$ is large enough then $c_n>0$, so Lemma 4.8 gives:
(5.12)\begin{align} \log [\mu_n^{\omega}]_{c_n} & \ge \displaystyle\int ( c_n^{{-}1} \log^{+}(x) - \log^{-}(x) ) \, d\mu_n^{\omega}(x) \end{align}(5.13)\begin{align} & = \displaystyle\frac{1}{c_n} \cdot \frac{1}{n} \displaystyle\sum_{i=0}^{n-1} \log^{+}(f(T^{i}\omega)) - \frac{1}{n} \sum_{i=0}^{n-1} \log^{-}(f(T^{i}\omega)), \end{align}which by (5.10) tends to $\infty$. This proves (5.11) in the last case.
Part (5.4) of the theorem is proved, and now lets use it to prove part (5.5). Consider a sequence $(k_n)$ of integers such that $1 \le k_n \le n$ and $c_n := k_n/n$ tends to $c$. By Theorem 3.4,
If $[\mu ]_c = \infty$, i.e. $[\mu _n^{\omega }]_{c_n} \to \infty$, then the first inequality forces the symmetric means to tend to $\infty$ as well. So let us assume that $[\mu ]_c$ is finite. If $c > 0$ then, by Lemma 3.5, the fraction on the RHS converges to $1$ as $n \to \infty$, and therefore, we obtain the desired limit (5.5). If $c = 0$, then we appeal to Maclaurin inequality in the form
So:
Since (5.11) also holds with $c_n \equiv 0$, we see that all three terms converge together to $[\mu ]_c$, thus, proving (5.5) also in the case $c=0$.
Like Birkhoff's Ergodic Theorem itself, Theorem 5.2 should be possible to generalize in numerous directions. For example, part (5.4) can be easily adapted to flows or semiflows (actions of the group $\mathbb {R}$ or the semigroup $\mathbb {R}_+$). One can also consider actions of amenable groups, like [Reference Austin2, Reference Navas22]. We shall not pursue these matters. In another direction, let us note that Central Limit Theorems for symmetric means of i.i.d. random variables have been proved by Székely [Reference Székely27] and van Es [Reference van Es28].
A weaker version of Theorem 5.2, in which the function $f$ is assumed to be bounded away from zero and infinity, was obtained in [Reference Bochi, Iommi and Ponce5, Theorem 5.1] as a corollary of a fairly general pointwise ergodic theorem: the Law of Large Permanents [Reference Bochi, Iommi and Ponce5, Theorem 4.1]. We now briefly discuss a generalization of that result obtained by Balogh and Nguyen [Reference Balogh and Nguyen3, Theorem 1.6]. Suppose that $T$ is an ergodic measure-preserving action of the semigroup $\mathbb {N}^{2}$ on the space $(X,\, \mu )$. Given an observable $g: X \to \mathbb {R}_{+ + }$ and a point $x \in X$, we define an infinite matrix whose $(i,\,j)$-entry is $g(T^{(i,j)}x)$. Consider square truncations of this matrix and then take the limit of the corresponding permanental means as the size of the square tends to infinity. It is proved that the limit exists $\mu$-almost everywhere. But not only that, it is also possible to identify the limit. It turns out that it is a functional scaling mean. This is a far reaching generalization of the matrix scaling mean (3.27): see [Reference Bochi, Iommi and Ponce5, Section 3.1].
Additional information
The interested reader may find in the ArXiv an extended version of this paper, https://arxiv.org/abs/2103.05182. It contains two extra sections, 6 and 7. In Section 6, we study concavity properties of the HS barycenter. Among other results,
• we obtain a version of Newton's inequality for the HS barycenter;
• we prove that as a function of the measure, the HS barycenter is log-concave but not quasi-affine;
• we provide a formula for the Hessian of the function $\log [\mathord {\cdot }]_c$.
In Section 7, we introduce another barycenter which provides the best quasi-affine approximation of the HS barycenter, and is a deviation mean in a sense akin to Daróczy.
After this paper was written, the first-named author found more situations where HS barycenters arise. The hypergeometric means introduced by Carlson in [Reference Carlson9] include symmetric means and many others as particular cases: see [Reference Brenner and Carlson7, § 3]. In a forthcoming paper, the first-named author will show that Halász–Székely barycenters can be defined for all real values of the parameter $c$, and that they govern the limit behaviour of hypergeometric means. For instance, the fixed parameter $t$ in the convergence result [Reference Brenner and Carlson7, Theorem 3] can be replaced by any asymptotically linear sequence $t(n) \sim -cn$, with the corresponding arithmetic means being replaced by Halász–Székely means with parameter $c$.
Acknowledgements
We thank Juarez Bochi for helping us with computer experiments at a preliminary stage of this project.
The authors were partially supported by CONICYT PIA ACT172001. J.B. was partially supported by Proyecto Fondecyt 1180371. G.I. was partially supported by Proyecto Fondecyt 1190194. M.P. was partially supported by Proyecto Fondecyt 1180922.