1. Introduction
Since [Reference Kesten19], the theoretical properties of the stochastic recurrence equation (SRE) $\boldsymbol{X}_{t}=\boldsymbol{A}_{t} \boldsymbol{X}_{t-1}+\boldsymbol{B}_{t}$ has received much attention. This equation gathers a large class of classical econometric processes such as the GARCH and ARMA models, and their numerous variants. A sufficient condition of existence and uniqueness of a strictly stationary solution was proposed in [Reference Brandt5] in the case where $(\boldsymbol{A}_t,\boldsymbol{B}_t)_t$ is stationary and ergodic. Under an irreducibility condition, [Reference Bougerol and Picard4] established that this condition is also necessary when the sequence $(\boldsymbol{A}_{t},\boldsymbol{B}_{t})$ is independent and identically distributed (i.i.d.). The probabilistic properties of the stationary solution of SRE model in the i.i.d. case are well known. In the scalar case, [Reference Kesten19] showed that $\mathbb{P}(\pm \boldsymbol{X}_1>x) \sim c_{\pm} x^{-a}$ as $x \rightarrow \infty$ for some positive constants $c_{\pm}$ . A thorough study of SRE models, in particular their tail behavior, is presented in [Reference Buraczewski, Damek and Mikosch6]. The SRE model is the affine-mapping-particular case of the so-called stochastic iterated function system (IFS) $\boldsymbol{X}_{t}=\Psi(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$ . Most of the theoretical properties established for SRE models (stationary, tail properties) can be extended to IFS equations.
One important application of SREs in time series analysis is the study of the stationarity properties of GARCH processes. Assuming i.i.d. innovations, [Reference Bougerol and Picard3] deduced from [Reference Brandt5] a necessary and sufficient condition for the existence of a unique stationary solution of a general GARCH(p, q) model. In recent years, the i.i.d. assumption on the innovations has often been replaced by a less restrictive conditional moment assumption (the model is then called ‘semi-strong’ GARCH). See [Reference Escanciano10] for the classical GARCH(p,q) model, and [Reference Francq and Thieu12, Reference Han and Kristensen17] for GARCH-X models. The GARCH-MIDAS models of [Reference Engle, Ghysels and Sohn9] constitute another class of IFS models which are not driven by an i.i.d. sequence. Another example is given by GARCH-X models which are IFS driven by a (generally non-i.i.d.) sequence of innovations and covariates. This motivates studying IFS equations driven by non-i.i.d. innovations.
However, strict stationarity generally does not suffice for establishing the asymptotic properties of estimators, such as the quasi-maximum likelihood estimator (QMLE). To our knowledge, all existing works on the QML inference of IFS models assume the existence of a small-order moment of the observed process. Surprisingly, however, the strictly stationary solutions of IFS equations with non-i.i.d. innovations may not admit any finite moment.
The aim of this paper is to establish that the stationary trajectories of the IFS equations enjoy an exponential control property. We also show that this property is sufficient to establish the consistency of the QMLE of semi-strong GARCH models.
The rest of the paper is organized as follows. In Section 2 we present our main result, and Section 3 is devoted to its proof. Section 4 investigates the estimation of the semi-strong GARCH(p, q) model. Complementary proofs are displayed in the Appendices.
2. Stochastic IFS without moments
Let $(E, \mathcal{E})$ be a measurable space and (F, d) a complete and separable metric space (Polish space). Let $(\boldsymbol{\theta}_{t})_{t \in \mathbb{Z}}$ be a stationary and ergodic process valued in E, and let $\Psi\colon E \times F \rightarrow F$ be a function such that $x \mapsto \Psi(\theta, x)$ is Lipschitz continuous for all $\theta \in E$ . Let
where $\boldsymbol{\Psi}_{t}=\Psi(\boldsymbol{\theta}_t,\cdot)$ . Let $\boldsymbol{\Lambda}_{t}^{(0)}=1$ and $\boldsymbol{\Lambda}_{t}^{(r)} = \Lambda(\boldsymbol{\Psi}_{t}\circ\cdots\circ\boldsymbol{\Psi}_{t-r+1})$ for all $r>0$ .
Consider the IFS
A solution $(\boldsymbol{X}_{t})$ of (1) is said to be causal if, for every t, $\boldsymbol{X}_{t}$ is $\sigma(\boldsymbol{\theta}_{k},\,k\leq t)$ -measurable.
Under a slightly different form, the following result has been established in [Reference Elton8, Theorem 3] and [Reference Bougerol2, Theorem 3.1]; see also [Reference Straumann and Mikosch22, Theorem 2.8] and the review in [Reference Diaconis and Freedman7].
Theorem 1. Assume the following conditions hold: (i) there exists a constant $c \in F$ such that $\mathbb{E} \ln ^{+}d(\boldsymbol{\Psi}_{0}( c),c) < \infty$ ; (ii) $\mathbb{E} \ln ^{+} \boldsymbol{\Lambda}_{0}<\infty$ ; and (iii) $\lim_{r\rightarrow\infty}({1}/{r})\ln \boldsymbol{\Lambda}_{0}^{(r)} < 0$ almost surely (a.s.). Then there exists a unique stationary (causal and ergodic) solution $(\boldsymbol{X}_{t})_{t \in \mathbb{Z}}$ to (1).
Moreover,
Note that $(\ln\boldsymbol{\Lambda}_{0}^{(r)})_{r\geq1}$ is a sub-additive sequence. Therefore, by the sub-additive ergodic theorem of [Reference Kingman20], the limit in assumption (iii) exists.
For the reader’s convenience and because we have not been able to find (2) exactly under this form, we provide a proof of Theorem 1 in Appendix A.
Remark 1. If $(\boldsymbol{\theta}_t)$ is i.i.d., it is possible to prove in particular cases, including the affine mapping, that $d(\boldsymbol{X}_{1},c)$ has a power-law tail [Reference Buraczewski, Damek and Mikosch6, Theorem 5.3.6]. More generally, it can be shown that, under the conditions of Theorem 1, there exists $s>0$ such that $\mathbb{E}d(\boldsymbol{X}_{1},c)^s < \infty$ . This small moment property is often used in the statistical inference of IFS models, for example, to prove the consistency of GARCH models and their derivatives (see [Reference Berkes, Horváth and Kokoszka1] for the GARCH model and [Reference Francq, Wintenberger and Zakoian13] for the EGARCH and Log-GARCH models). If $(\boldsymbol{\theta}_t)$ is not i.i.d., the examples below show that the stationary solution may not admit any small-order moment.
Example 1. Let $\delta\in(0,1)$ and let $(\boldsymbol{z}_t)_{t\in\mathbb{Z}}$ be an i.i.d. non-negative real process with $\mathbb{E}\boldsymbol{z}_t=\frac12(1-\delta)$ and $\mathbb{E}\boldsymbol{z}_t^2=\infty$ . The process $(\boldsymbol{\theta}_t)$ defined by $\boldsymbol{\theta}_t=\sum_{k=0}^{\infty}\delta^k \boldsymbol{z}_{t-k}$ for all $t\in\mathbb{Z}$ satisfies $\mathbb{E}\boldsymbol{\theta}_t=\frac{1}{2}$ and is such that, for all $t\in\mathbb{Z}$ , $\boldsymbol{x}_t=1+\sum_{k=1}^{\infty}\prod_{j=1}^{k}\boldsymbol{\theta}_{t-j+1}$ exists a.s. Moreover, $(\boldsymbol{x}_t)$ is the unique stationary solution of $\boldsymbol{x}_t=\boldsymbol{\theta}_t \boldsymbol{x}_{t-1}+1$ , $t\in \mathbb{Z}$ . Note that $\boldsymbol{x}_t\geq\prod_{j=1}^{k}\boldsymbol{\theta}_{t-j+1} \geq \delta^{{k(k-1)}/{2}}(\boldsymbol{z}_{t-k+1})^k$ for all $k\in\mathbb{N}^*$ . For all $s>0$ , we thus have $\mathbb{E}\boldsymbol{x}_0^s\geq\mathbb{E}\delta^{{sk(k-1)}/{2}}(\boldsymbol{z}_{0})^{sk}=\infty$ for k such that $sk>2$ .
The previous example is simple, but probably a little artificial. We now give an example of commonly used econometric models, for which it was recently proven that the strictly stationary solution does not admit any finite moment.
Example 2. Consider the following GARCH-MIDAS model [Reference Engle, Ghysels and Sohn9]:
where $(\boldsymbol{\eta}_{t})_t$ is a zero-mean and unit-variance i.i.d. sequence, $\alpha>0$ , $\beta\geq0$ , $\alpha+\beta<1$ , $a>0$ , and $b>0$ . Noting that $\boldsymbol{\epsilon}_t:= \boldsymbol{\sigma}_{t} \boldsymbol{\eta}_{t}$ is a GARCH process, we see that $(\boldsymbol{\tau}_{t})$ follows the SRE $\boldsymbol{\tau}_{t}=a+b \boldsymbol{r}_{t-1}^{2}=a+(b \boldsymbol{\epsilon}^2_{t-1})\boldsymbol{\tau}_{t-1}$ driven by a non-i.i.d. sequence $\boldsymbol{\epsilon}_t$ . It can be shown that, when $b\leq 1$ , the process $(\boldsymbol{r}_t)$ is strictly stationary but, when $\boldsymbol{\eta}_0$ has unbounded support, then, for any $s>0$ , $E|\boldsymbol{r}_t|^s=\infty$ . See [Reference Francq, Kandji and Zakoian11, Proposition 1] for the proof of the previous result.
We now state our main result, which provides a way to circumvent the non-existence of small-order moments for models such as those of Examples 1 and 2. Section 4 will be devoted to the statistical study of a class of econometric models where the existence of moments is not guaranteed.
Theorem 2. Under the conditions of Theorem 1, for all $t\in\mathbb{Z}$ ,
Theorem 2 can be interpreted as an exponential control of the trajectory of the stationary solution. Note that the property $\mathbb{E}\ln^+d(\boldsymbol{X}_{1},c)<\infty$ (a weaker condition than the existence of a small-order moment) implies the results of Theorem 2 (see Appendix B). However, the converse is false (see [Reference Tanny23, Example (a)]).
As a consequence of the previous theorem, we obtain the following result. Its proof is provided in Appendix C.
Corollary 1. Under the conditions of Theorem 2, almost surely, $\lim_{|n|\rightarrow\infty}({1}/{|n|})\ln^+ d(\boldsymbol{X}_{t+n},c)$ exists and is equal to 0; if $\mathbb{E}\ln^-d(\boldsymbol{X}_{1},c)<\infty$ , then
3. Proof of the main result
To show Theorem 2, we first define an SRE which bounds the distance between $\boldsymbol{X}_{t}$ and c.
Note that, by [Reference Kingman20],
so by Theorem 1(iii) there exists a positive integer $r_0$ such that $\mathbb{E}\ln \boldsymbol{\Lambda}_{0}^{(r_0)}<0$ . It can be shown that $\mathbb{E}[\ln((\boldsymbol{\Lambda}_{0}^{(r_0)}+u))]\stackrel{u\downarrow0}{\longrightarrow}\mathbb{E}\ln\boldsymbol{\Lambda}_{0}^{(r_0)}$ [Reference Straumann and Mikosch22, proof of Theorem 2.10]. Therefore, there exists $u_0>0$ such that $\ln(u_0)\leq\gamma_0:=\mathbb{E}[\ln((\boldsymbol{\Lambda}_{0}^{(r_0)}+u_0))]<0$ . We thus have, for all $v \in[\gamma_0,0)$ ,
with $\delta(v)=\exp(v-\gamma_0)\geq 1$ .
Now, for any integer $p\in[0, r_0-1]$ , define $(\boldsymbol{a}_{p,t}(v),\boldsymbol{b}_{p,t})_{t\in\mathbb{Z}}$ by
By Theorem 1(i) and (ii), and by the elementary inequality $\ln(\sum_{i=1}^{n} a_{i}) \leq \ln n+\sum_{i=1}^{n} \ln^+a_{i}$ for non-negative $\{a_{i}\}_{i=1}^{n}$ , we have $\mathbb{E}\ln^+\boldsymbol{a}_{p,t}(v)<\infty$ and $\mathbb{E}\ln^+\boldsymbol{b}_{p,t}(v)<\infty$ . Therefore, in view of (5), there exists a unique stationary solution $(\boldsymbol{z}_{p,t}(v))_t$ to the equation
Note that, by [Reference Brandt5],
By iterating (6), we have
By (7) and (8), $\big(\prod_{i=0}^{n} \boldsymbol{a}_{p,t-i}(v)\big)\boldsymbol{z}_{p,t-(n+1)}(v)$ is the remainder of a convergent series, and hence almost surely converges to 0. That is,
We now give a technical lemma linking the processes $(\boldsymbol{X}_{t})$ and $(\boldsymbol{z}_{p,t}(v))_t$ .
Lemma 1. For all $v\in[\gamma_0,0)$ , $0\leq p\leq r_0-1$ , and $t\in\mathbb{Z}$ ,
Proof of Lemma 1. For any integer n, let q and m denote the quotient and remainder of the Euclidean division of n by $r_0$ : $n=qr_0+m$ . By sub-multiplicativity we have
For all $q\in\mathbb{N}$ , we then obtain
It follows that
Since $\delta(v)\geq1$ and $u_0>0$ , we obtain
In view of the last two inequalities, together with (7) and (2), we have
which proves (10).
Let $\textbf{Aff}$ denote the set of affine maps from $\mathbb{R}$ into $\mathbb{R}$ . An element $\boldsymbol{f}_{a,b}$ of $\textbf{Aff}$ can be written as $\boldsymbol{f}_{a,b}(x) =a x+b$ , $x \in \mathbb{R}$ , where $(a,b)\in\mathbb{R}^2$ .
Lemma 2. Let us define a function $\Phi$ from $\textbf{Aff}$ to $\mathbb{R}_+$ by $\Phi(\boldsymbol{f}_{a,b})=|a|+|b|$ .
-
(i) For any x with $|x| \geq 1$ , $|\boldsymbol{f}_{a,b}(x)|\leq\Phi(\boldsymbol{f}_{a,b})|x|$ .
-
(ii) If $|d|\geq1$ then $\Phi(\boldsymbol{f}_{a,b}\circ\boldsymbol{f}_{c,d}) \leq \Phi(\boldsymbol{f}_{a,b})\Phi(\boldsymbol{f}_{c,d})$ .
Since Lemma 2 is elementary, its proof is skipped. Note that $\Phi$ is the 1-norm in the vector space of affine maps.
Lemma 3. For all $p\in\{0,\dots, r_0-1\}$ and $t\in\mathbb{Z}$ , letting $Q_p(t)=r_0t+p$ ,
Lemma 3 distinguishes between cases (i) and (ii) because their proofs are different.
Proof of Lemma 3. We start by proving (i). Let $\boldsymbol{f}_t$ be the random affine map defined by $\boldsymbol{f}_t(x) = \boldsymbol{a}_{p,t}(v)x + \boldsymbol{b}_{p,t}$ for all $x\in\mathbb{R}$ . Define also the maps $\boldsymbol{\gamma}_{t,n} = \boldsymbol{f}_t\circ\boldsymbol{f}_{t-1}\dotsb\circ\boldsymbol{f}_{t-n+1}$ and $\boldsymbol{\zeta}_{t,n}=\boldsymbol{f}_{t+n}\circ \boldsymbol{f}_{t+n-1}\dotsb\circ\boldsymbol{f}_{t+1}$ for all $(t,n)\in \mathbb{Z}\times\mathbb{N}^*$ . Note that
Since $\boldsymbol{b}_{p,t}\geq1$ , by Lemma 2(ii),
are sub-additive sequences. By arguments already used, we have $\mathbb{E}|\ln\Phi(\boldsymbol{\gamma}_{t,1})| = \mathbb{E}|\ln\Phi(\boldsymbol{\zeta}_{t,1})| = \mathbb{E}|\ln\Phi(\boldsymbol{f}_t)|<\infty$ . In view of (11) and Lemma 2(i),
Because $\boldsymbol{z}_{p,t}(v)$ does not depend on n, we have $\limsup\limits_{n \rightarrow \infty}({1}/{n})\ln\boldsymbol{z}_{p,t}(v) = 0$ a.s. Therefore,
Since, for any $n\in\mathbb{N}^*$ , $\boldsymbol{u}_{t,n}$ and $\boldsymbol{w}_{t,n}$ have the same law, by (12) and Kingman’s sub-additive ergodic theorem,
On the other hand, in view of (8), we have, by the positivity of the coefficients,
Therefore, $\lim_{n \rightarrow \infty}\boldsymbol{u}_{t,n}=\ln\boldsymbol{z}_{p,t}(v)$ a.s., which entails
By (13), (14), and (15), we get $\limsup\limits_{n \rightarrow \infty}({1}/{n})\ln\boldsymbol{z}_{p,t+n}(v) \leq 0$ a.s., which implies, by (10), part (i) of the lemma.
For (ii), by (10), (9), (5), and the ergodic theorem, we have
for all $v\in[\gamma_0,0)$ . Letting $v\rightarrow0^-$ , we get the result.
We are now ready to prove Theorem 2.
Proof of Theorem 2. For all $t\in\mathbb{Z}$ , let $t^\prime\in\mathbb{Z}$ and $p^\prime$ , $0\leq p^\prime\leq r_0-1$ , be such that $t=r_0t^\prime+p^\prime$ . Note that $\{t+k,k\in\mathbb{N}\}\subset\bigcup_{0\leq p\leq r_0-1}\{r_0(t^\prime+k)+p,k\in\mathbb{N}\}$ . This and Lemma 3(i) imply that
for
which establishes (i). Part (ii) follows from similar arguments.
4. Inference for semi-strong GARCH(p, q)
Consider the GARCH(p, q) model
where $\omega_{0} > 0$ , $\alpha_{0 i} \geqslant 0$ ( $i=1, \ldots, q$ ), and $\beta_{0 j} \geqslant 0$ ( $j=1, \ldots, p$ ). When $(\boldsymbol{\eta}_{t})$ is i.i.d., the model in (16) is a standard strong GARCH, for which the statistical inference has been thoroughly studied. In particular, [Reference Berkes, Horváth and Kokoszka1, Reference Francq and Zakoian14] studied the QMLE under the stationarity of $(\boldsymbol{\epsilon}_{t})$ , and [Reference Jensen and Rahbek18] explored the asymptotic behavior of the QMLE in the explosive case. In the stationary framework, [Reference Escanciano10] proved the consistency and asymptotic normality of the QMLE without i.i.d.-ness for $(\boldsymbol{\eta}_t)$ , but had to assume that $E|\boldsymbol{\epsilon}_t|^s<\infty$ for some small $s>0$ . The aim of this section is to relax this extra moment assumption.
4.1. Property of the strictly stationary solution
Let
with standard notation.
The model in (16) is a special case of (1) using $\boldsymbol{\theta}_{t}=(\boldsymbol{A}_{t},\boldsymbol{b}_t)$ , $\boldsymbol{X}_{t} = (\boldsymbol{\epsilon}_{t}^{2},\ldots,\boldsymbol{\epsilon}_{t-q+1}^{2}$ , $\boldsymbol{h}_{t}^{2},\ldots,\boldsymbol{h}_{t-p+1}^{2})'$ , $\Psi(\theta, x)=Ax+b$ , and $d(x,y)=\|x-y\|$ for any norm $\|\cdot\|$ on $\mathbb{R}^{p+q}$ . Note that $\boldsymbol{\Lambda}_{t}^{(r)}=\|\boldsymbol{A}_{t}\boldsymbol{A}_{t-1}\ldots\boldsymbol{A}_{t-r+1}\|$ .
In what follows, we do not assume that $(\boldsymbol{\eta}_t)$ is i.i.d., we only assume that it is stationary and ergodic. If $\mathbb{E}\ln^+\boldsymbol{\eta}_1^2<\infty$ , Theorem 1 applies with $c=0_{p+q}$ . Therefore, in view of (4), there exists a unique non-anticipative strictly stationary solution $(\boldsymbol{\epsilon}_t)$ to model (16) if
By Theorem 2, it follows that the strictly stationary solution of (16) satisfies
for all $t\in \mathbb{Z}$ .
In the GARCH(1,1) case, it is easy to check that $\gamma(\textbf{A}) = \mathbb{E}\ln(\alpha_{01}\boldsymbol{\eta}_{t}^2+\beta_{01})$ . For general GARCH(p,q) of the form (16), it seems impossible to compute $\gamma(\textbf{A})$ explicitly. This issue has been discussed in several papers, e.g. [3, p. 117] and [6, pp. 148, 149]. Both papers recommend estimation by computer simulation.
4.2. QML estimator
Let $\{\boldsymbol{\epsilon}_{t}\}_{t=1}^{n}$ be a sample of size n of the unique non-anticipative strictly stationary solution of model (16). The vector of parameters $\boldsymbol{\theta} = (\boldsymbol{\theta}_{1}, \ldots, \boldsymbol{\theta}_{p+q+1})^{\top} =(\omega, \alpha_{1}, \ldots, \alpha_{q}, \beta_{1}, \ldots, \beta_{p})^{\top}$ belongs to a parameter space $\boldsymbol{\Theta} \subset\mathopen] 0,+\infty\mathclose[ \times \mathopen[0, \infty\mathclose[^{p+q}$ . The true value of the parameter is unknown and is denoted by $\boldsymbol{\theta}_{0} = (\omega_{0}, \alpha_{01}, \ldots, \alpha_{0 q}$ , $\beta_{01}, \ldots, \beta_{0 p})^{\top}$ . Conditionally on initial values $\boldsymbol{\epsilon}_{0}, \ldots, \boldsymbol{\epsilon}_{1-q}$ , $\tilde{\boldsymbol{\sigma}}_{0}^{2}, \ldots, \tilde{\boldsymbol{\sigma}}_{1-p}^{2}$ , the Gaussian quasi-likelihood is defined by
where the $\tilde{\boldsymbol{\sigma}}_{t}^{2}$ are defined recursively, for $t \geqslant 1$ , by
For instance, the initial values can be chosen as
with $c=\omega$ or $\boldsymbol{\epsilon}_{1}^{2}$ . The standard estimator of the GARCH parameter $\boldsymbol{\theta}_0$ is the QMLE defined as any measurable solution $\hat{\boldsymbol{\boldsymbol{\theta}}}_{n}$ of
where $\tilde{\textbf{l}}_{n}(\boldsymbol{\theta}) = n^{-1} \sum_{t=1}^{n} \tilde{\ell}_{t}$ and $\tilde{\ell}_{t} = \tilde{\ell}_{t}(\boldsymbol{\theta}) = ({\boldsymbol{\epsilon}_{t}^{2}}/{\tilde{\boldsymbol{\sigma}}_{t}^{2}})+\ln \tilde{\boldsymbol{\sigma}}_{t}^{2}$ .
Let $\mathcal{A}_{\boldsymbol{\theta}}(z) = \sum_{i=1}^{q} \alpha_{i} z^{i}$ and $\mathcal{B}_{\boldsymbol{\theta}}(z) = 1-\sum_{j=1}^{p} \beta_{j} z^{j}$ . It is not restrictive to assume that $q\geq1$ . By convention, $\mathcal{B}_{\boldsymbol{\theta}}(z)=1$ if $p=0$ . Let $\mathcal{F}_{t-1}$ be the $\sigma$ -field generated by $(\boldsymbol{\epsilon}_{t-1}, \boldsymbol{\epsilon}_{t-2}, \ldots)$ . To show the strong consistency, we make the following assumptions.
Assumption 1. $\boldsymbol{\theta}_{0} \in \boldsymbol{\Theta}$ and $\boldsymbol{\Theta}$ is compact.
Assumption 2. $\gamma(\textbf{A}_{0}) < 0$ and, for all $\boldsymbol{\theta} \in \boldsymbol{\Theta}$ , $\sum_{j=1}^{p} \beta_{j}<1$ .
Assumption 3. $(\boldsymbol{\eta}_t)$ is stationary and ergodic; $\boldsymbol{\eta}_{t}^{2}$ has a non-degenerate distribution with (i) $\mathbb{E}[\boldsymbol{\eta}_{t}^{2} \mid \mathcal{F}_{t-1}] = 1$ a.s. and (ii) $\mathbb{E}\ln\boldsymbol{\eta}_t^2>-\infty$ .
Assumption 4. If $p>0$ , $\mathcal{A}_{\boldsymbol{\theta}_0}(z)$ and $\mathcal{B}_{\boldsymbol{\theta}_{0}}(z)$ have no common root, $\mathcal{A}_{\boldsymbol{\theta}_{0}}(1) \neq 0$ , and $\alpha_{0 q} + \beta_{0 p} \neq 0$ .
Remark 2. Assumptions 1, 2, and 3 are standard (see [Reference Francq and Zakoian14] for comments on these assumptions). Assumption 3(i) is obviously less restrictive than the i.i.d. assumption with finite second-order moments. In Appendix D, we provide an explicit example of semi-strong GARCH based on a non-i.i.d. martingale difference innovation satisfying Assumption 3(i). This assumption was first used in [Reference Lee and Hansen21] for the inference of GARCH models, and [Reference Escanciano10] established the consistency of the QMLE under this assumption, with a small-order moment condition on the observed process instead of our Assumption 3(ii). Note that the latter assumption precludes densities with too much mass around zero, but is satisfied by most commonly used distributions. It is also weaker than the regularity condition on the $\boldsymbol{\eta}_t$ law ( $\lim _{t \rightarrow 0} t^{-\mu} \mathbb{P}\{\boldsymbol{\eta}_{0}^{2} \leqslant t\} = 0$ for some $\mu > 0$ ) used in [Reference Berkes, Horváth and Kokoszka1] (see Appendix E).
Assumption 2 implies that the roots of $\mathcal{B}_{\boldsymbol{\theta}}(z)$ are outside the unit disc. Therefore, by the second inequality of (17), we can define $(\boldsymbol{\sigma}_{t}^{2})=\{\boldsymbol{\sigma}_{t}^{2}(\boldsymbol{\theta})\}$ as the (unique) strictly stationary, ergodic, and non-anticipative solution of
see Appendix F.
Note that $\boldsymbol{\sigma}_{t}^{2}(\boldsymbol{\theta}_{0}) = \boldsymbol{h}_{t}$ . Let
We are now able to establish the strong consistency of the QMLE.
Theorem 3. Let $(\hat{\boldsymbol{\boldsymbol{\theta}}}_{n})$ be a sequence of QMLE satisfying (19), with any initial condition (18). Then, under Assumptions 1–4, $\hat{\boldsymbol{\boldsymbol{\theta}}}_{n} \rightarrow \boldsymbol{\theta}_{0}$ a.s. as $n \rightarrow \infty$ .
Remark 3. [Reference Escanciano10] established the asymptotic normality of the QMLE under the assumption that a small-order moment exists. This moment condition is mainly used to justify the existence of the asymptotic covariance of the QMLE. To the best of our knowledge, the asymptotic normality has never been shown without a hypothesis that implies the existence of a small-order moment. In some cases, the asymptotic covariance matrix may not exist without a finite moment of sufficiently large order [15, Section 3.1]. Study of the asymptotic distribution of the semi-strong GARCH without any moment condition is left for future work.
Proof of Theorem 3. The proof relies on the following intermediate results.
-
(i) $\lim_{n\to \infty} \sup_{\boldsymbol{\theta}\in\boldsymbol{\Theta}} |{\textbf{l}}_n(\boldsymbol{\theta})-\tilde{{\textbf{l}}}_n(\boldsymbol{\theta})| = 0$ a.s.
-
(ii) If ${\sigma}_t^2(\boldsymbol{\theta})={\sigma}_t^2(\boldsymbol{\theta}_0)$ a.s., then $\boldsymbol{\theta}=\boldsymbol{\theta}_0$ .
-
(iii) If $\boldsymbol{\theta}\ne \boldsymbol{\theta}_0$ then $\mathbb{E}\{\ell_1(\boldsymbol{\theta}) - \ell_1(\boldsymbol{\theta}_0)\} > 0$ .
-
(iv) Any $\boldsymbol{\theta}\neq \boldsymbol{\theta}_0$ has a neighborhood $V(\boldsymbol{\theta})$ such that $\liminf_{n\to\infty}(\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) - \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)) > 0$ a.s.
To prove (i), note that [14, (4.7)] shows that, almost surely,
for some constants $C>0$ and $0<\rho<1$ (independent of n); (i) thus follows by Cesàro’s lemma, since the first inequality of (17) implies that $\rho^t\boldsymbol{\epsilon}_t^2 \rightarrow 0$ a.s. as $t \rightarrow \infty$ :
The proof of (ii) uses the same arguments as those of step (ii) in the proof of [Reference Francq and Zakoian14, Theorem 2.1].
Now let us turn to the proof of (iii). For strong GARCH models it is known that $\mathbb{E}\ell_1(\boldsymbol{\theta}_0)$ is finite. This may not be the case in our framework, so we give an alternative proof of (iii). We first establish the existence of $\mathbb{E}\{\ell_1(\boldsymbol{\theta}) - \ell_1(\boldsymbol{\theta}_0)\}$ . Let $W_t(\boldsymbol{\theta})={\sigma}_t^2(\boldsymbol{\theta}_0)/{\sigma}_t^2(\boldsymbol{\theta})$ and, for $K>0$ , $A_K=[K^{-1}, K]$ , write
where, for $x>0$ and $y\ge 0$ , $g(x,y)=-\log x+y(x-1)$ . Introducing the negative part $x^-=\max(-x, 0)$ of any real number x, we thus have
Noting that $W_t(\boldsymbol{\theta})$ is $\mathcal{F}_{t-1}$ -measurable and, by Assumption 3(i), $\mathbb{E}[g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2) \mid \mathcal{F}_{t-1}] = g(W_t(\boldsymbol{\theta}), 1)$ , the expectation of the first term on the right-hand side of (21) is well-defined and satisfies
since $g(x,1)\ge 0$ for any $x\ge 0$ , with equality only if $x=1$ . By (ii) we have that $W_t(\boldsymbol{\theta})=1$ a.s. if and only if $\boldsymbol{\theta}=\boldsymbol{\theta}_0$ . We thus have, by Beppo Levi’s theorem,
To deal with the expectation of the second term on the right-hand side of (21), we use the fact that, for $y > 0$ , $g(x,y) \ge g(1/y,y)$ . It follows that
because, by Assumption 3(ii), $\mathbb{E}[\{g(1/\boldsymbol{\eta}_t^2,\boldsymbol{\eta}_t^2)\}^-] < \infty$ and thus the convergence holds by Lebesgue’s dominated convergence theorem. This completes the proof of (iii).
Now we prove (iv). As for (iii), the possible non-existence of $\mathbb{E}\ell_1(\boldsymbol{\theta})$ requires a modification of the standard proof. For any $\boldsymbol{\theta}\in \boldsymbol{\Theta}$ we have
Hence, using (i),
For any $\boldsymbol{\theta}\in \boldsymbol{\Theta}$ and any positive integer k, let $V_k(\boldsymbol{\theta})$ be the open ball of center $\boldsymbol{\theta}$ and radius $1/k$ . Then
By arguments already given, under Assumption 3(ii),
Therefore, $\mathbb{E}(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0))$ exists in $\mathbb{R}\cup \{+\infty\}$ , and the ergodic theorem applies [16, Exercises 7.3, 7.4]). From (23) we obtain
The latter term in parentheses converges to $\ell_t(\boldsymbol{\theta}) - \ell_t(\boldsymbol{\theta}_0)$ as $k\to \infty$ , and, by standard arguments using the positive and negative parts of $\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0)$ , we have
which, by (i), is strictly positive. In view of (22), the proof of (iv) is complete.
Now we complete the proof of the theorem. The set $\boldsymbol{\Theta}$ is covered by the union of an arbitrary neighborhood $V(\boldsymbol{\theta}_0)$ of $\boldsymbol{\theta}_0$ and, for any $\boldsymbol{\theta}\ne \boldsymbol{\theta}_0$ , by neighborhoods $V(\boldsymbol{\theta})$ satisfying (iv). Obviously, $\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta}_0)\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) \le \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)$ a.s. Moreover, by the compactness of $\boldsymbol{\Theta}$ , there exists a finite subcover of the form $V(\boldsymbol{\theta}_0), V(\boldsymbol{\theta}_1), \ldots, V(\boldsymbol{\theta}_M)$ . By (iv), for $i=1, \ldots, M$ , there exists $n_i$ such that, for $n\ge n_i$ , $\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta}_i)\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) > \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)$ a.s. Thus, for $n\ge \max_{i=1, \ldots, M} (n_i)$ ,
from which we deduce that $\widehat{\boldsymbol{\theta}}_n$ belongs to $V(\boldsymbol{\theta}_0)$ for sufficiently large n.
Appendix A. Proof of Theorem 1
Proof For all $t \in \mathbb{Z}$ and $n \in \mathbb{N}$ , let
with $\boldsymbol{X}_{t, 0}=c$ . Note that $\boldsymbol{X}_{t, n} = \psi_{n}(\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots, \boldsymbol{\theta}_{t-n+1})$ for some measurable function $\psi_{n}\colon(E^{n},\mathcal{B}_{E^{n}})\rightarrow(F,\mathcal{B}_{F})$ , with the usual notation. For all n, the sequence $(\boldsymbol{X}_{t, n})_{t \in \mathbb{Z}}$ is thus stationary and ergodic. If, for all t, the limit $\boldsymbol{X}_{t} = \lim _{n \rightarrow \infty} \boldsymbol{X}_{t, n}$ exists a.s., then by taking the limit of both sides of (24), it can be seen that the process $(\boldsymbol{X}_{t})$ is a solution of (1). When it exists, the limit is a measurable function of the form $\boldsymbol{X}_{t} = \psi(\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots)$ and is therefore stationary, ergodic, and causal. For the measurability of $\boldsymbol{X}_{t}$ , we can consider the $\boldsymbol{X}_{t,n}$ as functions of ( $\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots$ ) and argue that, in a metric space, a limit of measurable functions is measurable. The existence of $\lim_{n \rightarrow \infty} \boldsymbol{X}_{t, n}$ was proved in [Reference Elton8], which showed that, a.s., the sequence $(\boldsymbol{X}_{t, n})_{n \in \mathbb{N}}$ is a Cauchy sequence in the complete space F.
By iterating (24) we have $\boldsymbol{X}_{t, n} = \boldsymbol{\Psi}_{t}\circ\cdots\circ\boldsymbol{\Psi}_{t-n+1}(c)$ . It follows that
For $n<m$ , we thus have
Note that
under (i) and (ii), by using Kingman’s sub-additive ergodic theorem [Reference Kingman20] and [Reference Francq and Zakoian16, Exercise 4.12]. We conclude, from the Cauchy criterion for the convergence of series with positive terms, that $\sum_{j=1}^{\infty}\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c)$ is a.s. finite under (i) and (ii). It follows that $(\boldsymbol{X}_{t, n})_{n \in \mathbb{N}}$ is a.s. a Cauchy sequence in F. The existence of a stationary and ergodic solution to (1) follows.
Assume that there exists another stationary process $(\boldsymbol{X}_{t}^{*})$ such that $\boldsymbol{X}_{t}^{*} = \boldsymbol{\Psi}_{t}(\boldsymbol{X}_{t-1}^{*})$ . For all $N \geq 0$ ,
Since $\boldsymbol{\Lambda}_{t}^{(N+1)} \rightarrow 0$ a.s. as $N \rightarrow \infty$ , and $d(\boldsymbol{X}_{t-N},\boldsymbol{X}_{t-N}^{*}) = O_{P}(1)$ by stationarity, the right-hand side of (26) tends to zero in probability. Since the left-hand side does not depend on N, we have $\mathbb{P}(d(\boldsymbol{X}_{t},\boldsymbol{X}_{t}^{*}) > \epsilon) = 0$ for all $\epsilon>0$ , and thus $\mathbb{P}(\boldsymbol{X}_{t}=\boldsymbol{X}_{t}^{*}) = 1$ , which establishes the uniqueness. In view of (25), we have $d(\boldsymbol{X}_{t},c) \leq \sum_{j=0}^{\infty}\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c)$ and (2) follows.
Appendix B. Proof of the comment following Theorem 2
For all $\epsilon>0$ , since $\mathbb{P}(\ln d(\boldsymbol{X}_{1},c)>\epsilon)=\mathbb{P}(\ln^+ d(\boldsymbol{X}_{1},c)>\epsilon)$ ,
It follows by the Borel–Cantelli lemma that $\limsup n^{-1}\ln d(\boldsymbol{X}_{t+n},c) \leq 0$ a.s. The second result is obtained by the same arguments.
Appendix C. Proof of Corollary 1
We have, for all $n\geq 1$ , $\sup_{k\geq n}\max(0,\ln d(\boldsymbol{X}_{t+k},c))= \max(0,\sup_{k\geq n}\ln d(\boldsymbol{X}_{t+k},c))$ . It follows that
Since, in addition, $\ln^+ d(\boldsymbol{X}_{t+n},c)$ is non-negative, $\lim_{n\rightarrow\infty}({1}/{n})\ln^+ d(\boldsymbol{X}_{t+n},c)$ exists and is equal to 0 a.s. We get $\lim_{n\rightarrow\infty}({1}/{n})\ln^+ d(\boldsymbol{X}_{t-n},c)$ by the same arguments, which gives the first part of the corollary.
For (3), we have $\ln d(\boldsymbol{X}_{t+n},c) = \ln^+ d(\boldsymbol{X}_{t+n},c) - \ln^- d(\boldsymbol{X}_{t+n},c)$ . Since $({1}/{|n|})\ln^+ d(\boldsymbol{X}_{t-n},c)$ converges a.s. to 0 and $({1}/{|n|})\ln^- d(\boldsymbol{X}_{t-n},c)$ also converges a.s. to 0 as $|n|\rightarrow\infty$ [Reference Francq and Zakoian16, Exercise 2.13], $({1}/{|n|})\ln d(\boldsymbol{X}_{t+n},c)$ converges a.s. to 0 as $|n|\rightarrow\infty$ .
Appendix D. Construction of a semi-strong GARCH
We first define a non-i.i.d. martingale difference process. Consider a sequence $(\boldsymbol{x}_{t})_{t\in\mathbb{Z}}$ of i.i.d. random variables with standard normal distribution. Since, for all $z\in\mathbb{R}_+$ , $\boldsymbol{x}_{t}\sqrt{2z}-z\sim\mathcal{N}(-z,2z)$ , using the moment-generating function of the Gaussian distribution, we have
If $(\boldsymbol{z}_t)$ is a positive process, independent of $(\boldsymbol{x}_{t})$ , we also have $\mathbb{E}\boldsymbol{\eta}_t^2=1$ , where $\boldsymbol{\eta}_t^2=\exp(\boldsymbol{x}_{t}\sqrt{2\boldsymbol{z}_t}-\boldsymbol{z}_t)$ . This is the case if, for instance, $\boldsymbol{z}_t$ follows a causal AR(1) model of the form $\boldsymbol{z}_t=\phi \boldsymbol{z}_{t-1}+\boldsymbol{u}_t$ with $\phi\in (0,1)$ and $\boldsymbol{u}_t$ i.i.d. with positive variance. It is easy to see that $\mbox{Cov}(\boldsymbol{z}_1,\boldsymbol{z}_0)\neq 0$ , and thus
It follows that $(\boldsymbol{\eta}_t^2)$ is not i.i.d. We now define $(\boldsymbol{\eta}_t)$ . Let $(\boldsymbol{r}_t)$ be an i.i.d. sequence of Rademacher variables (uniform distribution on $\{-1,1\}$ ), independent of the two sequences $(\boldsymbol{x}_{t})$ and $(\boldsymbol{u}_{t})$ . We define $(\boldsymbol{\eta}_t)$ by $\boldsymbol{\eta}_t = \boldsymbol{r}_t\sqrt{\boldsymbol{\eta}_t^2}$ .
Let $(\mathcal{F}_{t})$ be the canonical filtration of $(\boldsymbol{\eta}_t)$ , i.e. $\mathcal{F}_{t}=\sigma(\boldsymbol{\eta}_k, k\leq t)$ . Define a second filtration $\mathcal{H}_t=\sigma(\boldsymbol{r}_{k},\boldsymbol{x}_{k+1},\boldsymbol{u}_{k+1}, k\leq t)$ . Since $\mathcal{F}_t\subset\mathcal{H}_t$ and $\boldsymbol{r}_{t}$ is independent of $\mathcal{H}_{t-1}$ , we have
Define a new filtration $\mathcal{I}_t=\sigma(\boldsymbol{r}_{k},\boldsymbol{x}_{k},\boldsymbol{u}_{k+1}, k\leq t)$ . Since $\mathcal{F}_t\subset\mathcal{I}_t$ , $\boldsymbol{z}_{t}$ is $\mathcal{I}_{t-1}$ -measurable, and $\boldsymbol{x}_{t}$ is independent of $\mathcal{I}_{t-1}$ , so by (27) we have
We have thus shown the existence of a non-degenerate unit martingale difference sequence, that is, a stationary and ergodic sequence $(\boldsymbol{\eta}_{t})$ satisfying the conditions
It is then easy to define a semi-strong GARCH with innovations $(\boldsymbol{\eta}_{t})$ .
Appendix E. Complement to Remark 2
Knowing that $\mathbb{E}(\ln^+(\boldsymbol{\eta}_1^2)) < \infty$ by Assumption 3(i), to establish Assumption 3(ii) it is therefore sufficient to prove that $\mathbb{E}(\ln^-(\boldsymbol{\eta}_1^2)) < \infty$ . Using $\mathbb{E}(\ln^-(\boldsymbol{\eta}_1^2)) = \int_0^\infty\mathbb{P}(\ln^+({1}/{\boldsymbol{\eta}_1^2})\geq s)\,{\textrm{d}} s = \int_0^\infty\mathbb{P}(\ln({1}/{\boldsymbol{\eta}_1^2}) \geq s)\,{\textrm{d}} s = \int_0^\infty\mathbb{P}({1}/{\boldsymbol{\eta}_1^2}\geq\exp(s))\,{\textrm{d}} s = \int_0^\infty\mathbb{P}(\boldsymbol{\eta}_1^2\leq\exp(-s))\,{\textrm{d}} s$ , we have, under the condition of [Reference Berkes, Horváth and Kokoszka1], that $\mathbb{P}(\boldsymbol{\eta}^2_1\leq \exp(-s))=o(\exp(-\mu s))$ when $s\rightarrow\infty$ , which gives the result.
Appendix F. Proof of the existence of a unique strictly stationary solution to (20)
Rewriting (20) in vector form as $\boldsymbol{\underline{\sigma}}_{t}^{2}=\underline{\boldsymbol{c}}_{t}+B \boldsymbol{\underline{\sigma}}_{t-1}^{2}$ , where
we have, by the second inequality of (17) that $\limsup_{n\to\infty}({1}/{n})\ln \|\underline{\boldsymbol{c}}_{n}\|\leq 0$ . By Assumption 2, we deduce that
From this, we deduce by the Cauchy rule that the series $\boldsymbol{\hat{\sigma}}_{t}^{2}:=\sum_{n=0}^{\infty}B^n \underline{\boldsymbol{c}}_{t-n}^{2}$ converges almost surely. We note that $(\boldsymbol{\hat{\sigma}}_{t}^{2})$ is a strictly stationary, ergodic, and non-anticipative solution of (20).
To show the uniqueness, assume that there exists another stationary process $(\boldsymbol{\underline{\sigma}}_{t}^{2}{*})$ of (20). For all $n \geq 0$ , we have $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|=\|B^n\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}-B^n\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|\leq\|B^n\|\|\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}\|+\|B^n\|\|\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|$ . Since $\|B^n\|\rightarrow 0$ a.s. as $n \rightarrow \infty$ and $\|\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}\|$ and $\|\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|$ converge in law by stationarity, Slutsky’s theorem entails that $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|$ converges in law to 0 as $n \rightarrow \infty$ . Since $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|$ does not depend on n, we conclude that $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\| = 0$ a.s.
Acknowledgements
I am most thankful to the Editor and to two referees for their constructive comments and suggestions. I also want to acknowledge Christian Francq and Jean-Michel Zakoan for their guidance and feedback.
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.