1 Introduction
The kinetic framework is a general paradigm that aims to extend Boltzmann’s kinetic theory for dilute gases to other types of microscopic interacting systems. This approach has been highly informative, and became a cornerstone of the theory of nonequilibrium statistical mechanics for a large body of systems [Reference Spohn43, Reference Spohn44]. In the context of nonlinear dispersive waves, this framework was initiated in the first half of the past century [Reference Peierls41] and developed into what is now called wave turbulence theory [Reference Zakharov, L’vov and Falkovich51, Reference Nazarenko39]. There, waves of different frequencies interact nonlinearly at the microscopic level, and the goal is to extract an effective macroscopic picture of how the energy densities of the system evolve.
The description of such an effective evolution comes via the wave kinetic equation (WKE), which is the analogue of Boltzmann’s equation for nonlinear wave systems [Reference Spohn46]. Such kinetic equations have been derived at a formal level for many systems of physical interest (nonlinear Schrödinger (NLS) and nonlinear wave (NLW) equations, water waves, plasma models, lattice crystal dynamics, etc.; compare [Reference Nazarenko39] for a textbook treatment) and are used extensively in applications (thermal conductivity in crystals [Reference Spohn45], ocean forecasting [Reference Janssen31, Reference Burns49], and more). This kinetic description is conjectured to appear in the limit where the number of (locally interacting) waves goes to infinity and an appropriate measure of the interaction strength goes to zero (weak nonlinearityFootnote 1 ). In such kinetic limits, the total energy of the whole system often diverges.
The fundamental mathematical question here, which also has direct consequences for the physical theory, is to provide a rigorous justification of such wave kinetic equations starting from the microscopic dynamics given by the nonlinear dispersive model at hand. The importance of such an endeavour stems from the fact that it allows an understanding of the exact regimes and the limitations of the kinetic theory, which has long been a matter of scientific interest (see [Reference Denissenko, Lukaschuk and Nazarenko20, Reference Aubourg, Campagne, Peureux, Ardhuin, Sommeria, Viboud and Mordant1]). A few mathematical investigations have recently been devoted to studying problems in this spirit [Reference Faou23, Reference Buckmaster, Germain, Hani and Shatah7, Reference Lukkarinen and Spohn35], yielding some partial results and useful insights.
This manuscript continues the investigation initiated in [Reference Buckmaster, Germain, Hani and Shatah7], aimed at providing a rigorous justification of the wave kinetic equation corresponding to the nonlinear Schrödinger equation,
As we shall explain later, the sign of the nonlinearity has no effect on the kinetic description, so we choose the defocussing sign for concreteness. The natural setup for the problem is to start with a spatial domain given by a torus ${\mathbb T}^d_L$ of size L, which approaches infinity in the thermodynamic limit we seek. This torus can be rational or irrational, which amounts to rescaling the Laplacian into
and taking the spatial domain to be the standard torus of size L, namely ${\mathbb T}^d_L=[0,L]^d$ with periodic boundary conditions. With this normalisation, an irrational torus would correspond to taking the $\beta _j$ to be rationally independent. Our results cover both cases, and in part of them $\beta $ is assumed to be generic – that is, avoiding a set of Lebesgue measure $0$ .
The strength of the nonlinearity is related to the characteristic size $\lambda $ of the initial data (say in the conserved $L^2$ space). Adopting the ansatz $v=\lambda u$ , we arrive at the following equation:
The kinetic description of the long-time behaviour is akin to a law of large numbers, and therefore one has to start with a random distribution of the initial data. Heuristically, a randomly distributed, $L^{2}$ -normalised field would (with high probability) have a roughly uniform spatial distribution, and consequently an $L_x^{\infty }$ norm $\sim L^{-d/2}$ . This makes the strength of the nonlinearity in (NLS) comparable to $\lambda ^2 L^{-d}$ (at least initiallyFootnote 2 ), which motivates us to introduce the quantity
and phrase the results in terms of $\alpha $ instead of $\lambda $ . The kinetic conjecture states that at sufficiently long time scales, the effective dynamics of the Fourier-space mass density $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2 \left (k \in \mathbb Z^d_L=L^{-1}\mathbb Z^d\right )$ is well approximated – in the limit of large L and vanishing $\alpha $ – by an appropriately scaled solution $n(t, \xi )$ of the following WKE:
where we used the shorthand notations $\phi _j:=\phi \left (\xi _j\right )$ and $\left \lvert \xi \right \rvert ^2_{\beta }=\sum _{j=1}^d \beta _j \left (\xi ^{\left (j\right )}\right )^2$ for $\xi =\left (\xi ^{(1)},\cdots ,\xi ^{(d)}\right )$ . More precisely, one expects this approximation to hold at the kinetic timescale $T_{\mathrm {kin}}\sim \alpha ^{-2}=\frac {L^{2d}}{\lambda ^4}$ , in the sense that
Of course, for such an approximation to hold at time $t=0$ , one has to start with a well-prepared initial distribution for $\widehat u_{\text {in}}(k)$ as follows: denoting by $n_{\text {in}}$ the initial data for (WKE), we assume
where $\eta _{k}(\omega )$ are mean- $0$ complex random variables satisfying $\mathbb E \left \lvert \eta _k\right \rvert ^2=1$ . In what follows, $\eta _k(\omega )$ will be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either the normalised complex Gaussian or the uniform distribution on the unit circle $\lvert z\rvert =1$ .
Before stating our results, it is worth remarking on the regime of data and solutions covered by this kinetic picture in comparison to previously studied and well-understood regimes in the nonlinear dispersive literature. For this, let us look back at the (pre-ansatz) NLS solution v, whose conserved energy is given by
We are dealing with solutions having an $L^{\infty }$ -norm of $O\left (\sqrt \alpha \right )$ (with high probability) and whose total mass is $O\left (\alpha L^d\right )$ , in a regime where $\alpha $ is vanishingly small and L is asymptotically large. These bounds on the solutions are true initially, as we have already explained, and will be propagated in our proof. In particular, the mass and energy are very large and will diverge in this kinetic limit, as is common in taking thermodynamic limits [Reference Ruelle42, Reference Minlos37]. Moreover, the potential part of the energy is dominated by the kinetic part – the former of size $O\left (\alpha ^3 L^d\right )$ and the latter of size $O\left (\alpha L^d\right )$ – which explains why there is no distinction between the defocussing and focussing nonlinearities in the kinetic limit. It would be interesting to see how the kinetic framework can be extended to regimes of solutions which are sensitive to the sign of the nonlinearity; this has been investigated in the physics literature (e.g., [Reference Dyachenko, Newell, Pushkarev and Zakharov22, Reference Fitzmaurice, Gurarie, McCaughan and Woyczynski25, Reference Zakharov, Korotkevich, Pushkarev and Resio50]).
1.1 Statement of the results
It is not a priori clear how the limits $L\to \infty $ and $\alpha \to 0$ need to be taken for formula (1.1) to hold or whether there is an additional scaling law (between $\alpha $ and L) that needs to be satisfied in the limit. In comparison, such scaling laws are imposed in the rigorous derivation of Boltzmann’s equation [Reference Lanford34, Reference Cercignani, Illner and Pulvirenti10, Reference Gallagher, Saint-Raymond and Texier26], which is derived in the so-called Boltzmann–Grad limit [Reference Grad27]: namely, the number N of particles goes to $\infty $ while their radius r goes to $0$ in such a way that $Nr^{d-1}\sim O(1)$ . To the best of our knowledge, this central point has not been adequately addressed in the wave-turbulence literature.
Our results seem to suggest some key differences depending on the chosen scaling law. Roughly speaking, we identify two special scaling laws for which we are able to justify the approximation (1.1) up to time scales $L^{-\varepsilon } T_{\text {kin}}$ for any arbitrarily small $\varepsilon>0$ . For other scaling laws, we identify significant absolute divergences in the power-series expansion for $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2$ at much earlier times. We can therefore only justify this approximation at such shorter times (which are still better than those in [Reference Buckmaster, Germain, Hani and Shatah7]). In these cases, whether or not formula (1.1) holds up to time scales $L^{-\varepsilon } T_{\text {kin}}$ depends on whether such series converge conditionally instead of absolutely, and thus would require new methods and ideas, as we explain later.
We start by identifying the two favourable scaling laws. We use the notation $\sigma +$ for any numerical constant $\sigma $ (e.g., $\sigma =-\varepsilon $ or $\sigma =-1-\frac {\varepsilon }{2}$ , where $\varepsilon $ is as in Theorem 1.1) to denote a constant that is strictly larger than and sufficiently close to $\sigma $ .
Theorem 1.1. Set $d\geq 2$ and let $\beta \in [1,2]^d$ be arbitrary. Suppose that $n_{\mathrm {in}} \in {\mathcal S}\left ({\mathbb R}^d \to [0, \infty )\right )$ is SchwartzFootnote 3 and $\eta _{k}(\omega )$ are independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either complex Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle $\lvert z\rvert =1$ . Assume well-prepared initial data $u_{\mathrm {in}}$ for (NLS) as in equation (1.2).
Fix $0<\varepsilon <1$ (in most interesting cases $\varepsilon $ will be small); recall that $\lambda $ and L are the parameters in (NLS) and let $\alpha =\lambda ^2L^{-d}$ be the characteristic strength of the nonlinearity. If $\alpha $ has the scaling law $\alpha \sim L^{(-\varepsilon )+}$ or $\alpha \sim L^{\left (-1-\frac {\varepsilon }{2}\right )+}$ , then we have
for all $L^{0+} \leq t \leq L^{-\varepsilon } T_{\mathrm {kin}}$ , where $T_{\mathrm {kin}}=\alpha ^{-2}/2$ , ${\mathcal K}$ is defined in (WKE) and $o_{\ell ^{\infty }_k}\left (\frac {t}{T_{\mathrm {kin}}}\right )_{L \to \infty }$ is a quantity that is bounded in $\ell ^{\infty }_k$ by $L^{-\theta } \frac {t}{T_{\mathrm {kin}}}$ for some $\theta>0$ .
We remark that in the time interval of the approximation we have been discussing, the right hand sides of formulas (1.1) and (1.3) are equivalent. Also note that any type of scaling law of the form $\alpha \sim L^{-s}$ gives an upper bound of $t\leq L^{-\varepsilon }T_{\mathrm {kin}}\sim L^{2s-\varepsilon }$ for the times considered. Consequently, for the two scaling laws in Theorem 1.1, the time t always satisfies $t\ll L^{2}$ , and it is for this reason that the rationality type of the torus is not relevant. As will be clear later, no similar results can hold for $t\gg L^2$ in the case of a rational torus,Footnote 4 as this would require rational quadratic forms to be equidistributed on scales $\ll 1$ , which is impossible. However, if the aspect ratios $\beta $ are assumed to be generically irrational, then one can access equidistribution scales that are as small as $L^{-d+1}$ for the resulting irrational quadratic forms [Reference Bourgain4, Reference Buckmaster, Germain, Hani and Shatah7]. This allows us to consider scaling laws for which $T_{\mathrm {kin}}$ can be as big as $L^{d-}$ on generically irrational tori.
Remark 1.2. Strictly speaking, in evaluating equation (1.3) one has to first ensure the existence of the solution u. This is guaranteed if $d\in \{2,3,4\}$ (when (NLS) is $H^1$ -critical or subcritical). When $d\geq 5$ we shall interpret equation (1.3) such that the expectation is taken only when the long-time smooth solution u exists. Moreover, from our proof it follows that the probability that this existence fails is at most $O\left (e^{-L^{\theta }}\right )$ , which quickly becomes negligible when $L\to \infty $ .
The following theorem covers general scaling laws, including the ones that can only be accessed for the generically irrational torus. By a simple calculation of exponents, we can see that it implies Theorem 1.1.
Theorem 1.3. With the same assumptions as in Theorem 1.1, we impose the following conditions on $(\alpha , L, T)$ for some $\delta>0$ :
Then formula (1.3) holds for all $L^{\delta } \leq t \leq T$ .
It is best to read this theorem in terms of the $\left (\log _L \left (\alpha ^{-1}\right ),\log _L T\right )$ plot in Figure 1. The kinetic conjecture corresponds to justifying the approximation in formula (1.1) up to time scales $T\lesssim T_{\mathrm {kin}}=\alpha ^{-2}$ . As we shall explain later, the time scale $T\sim T_{\mathrm {kin}}$ represents a critical scale for the problem from a probabilistic point of view. This is depicted by the red line in the figure, and the region below this line corresponds to a probabilistically subcritical regime (see Section 1.2.1). The shaded blue region corresponds to the $(\alpha , T)$ region in Theorem 1.3, neglecting $\delta $ losses. This region touches the line $T=\alpha ^{-2}$ at the two points corresponding to $\left (\alpha ^{-1}, T\right )=(1, 1)$ and $\left (L, L^2\right )$ , whereas the two scaling laws of Theorem 1.1, where $\left (\alpha ^{-1},T\right )\sim (L^{\varepsilon -},L^{\varepsilon -})$ and $\left (\alpha ^{-1},T\right )\sim \left (L^{1+\frac {\varepsilon }{2}-},L^{2-}\right )$ , approach these two points when $\varepsilon $ is small.
These results rely on a diagrammatic expansion of the NLS solution in Feynman diagrams akin to a Taylor expansion. The shaded blue region depicting the result of Theorem 1.3 corresponds to the cases when such a diagrammatic expansion is absolutely convergent for very large L. In the complementary region between the blue region and the line $T=T_{\text {kin}}$ , we show that some (arbitrarily high-degree) terms of this expansion do not converge to $0$ as their degree goes to $\infty $ , which means that the diagrammatic expansion cannot converge absolutely in this region. Therefore, the only way for the kinetic conjecture to be true in the scaling regimes not included in Theorem 1.1 is for those terms to exhibit a highly nontrivial cancellation, which would make the series converge conditionally but not absolutely.
Finally, we remark on the restriction in formula (1.4). The upper bounds on T on the left are necessary from number-theoretic considerations: indeed, if $T\gg L^2$ for a rational torus, or if $T\gg L^d$ for an irrational one, the exact resonances of the NLS equation dominate the quasi-resonant interactions that lead to the kinetic wave equation. One should therefore not expect the kinetic description to hold in those ranges of T (see Lemma 3.2 and Section 4). The second set of restrictions in formula (1.4) correspond exactly to the requirement that the size of the Feynman diagrams of degree n can be bounded by $\rho ^n$ with some $\rho \ll 1$ . In fact, if one aims only at proving existence with high probability (not caring about the asymptotics of $\mathbb {E}\left \lvert \widehat {u}(t,k)\right \rvert ^2$ ), then the restrictions on the left of formula (1.4) will not be necessary, and one obtains control for longer times. See also the following remark:
Remark 1.4 Admissible scaling laws
The foregoing restrictions on T impose the limits of the admissible scaling laws, in which $\alpha \to 0$ and $L \to \infty $ , for which the kinetic description of the long-time dynamics can appear. Indeed, since $T_{\mathrm {kin}}=\alpha ^{-2}$ , then the necessary (up to $L^{\delta }$ factors) restrictions $T\ll L^{2-\delta }$ (resp., $T\ll L^{d-\delta }$ ) on the rational (resp., irrational) torus already mentioned imply that one should only expect the previous kinetic description in the regime where $\alpha \gtrsim L^{-1}$ (resp., $\gtrsim L^{-d/2}$ ). In other words, the kinetic description requires the nonlinearity to be weak, but not too weak! In the complementary regime of very weak nonlinearity, the exact resonances of the equation dominate the quasi-resonances – a regime referred to as discrete wave turbulence (see [Reference L’vov and Nazarenko36, Reference Kartashova32, Reference Nazarenko39]), in which different effective equations, like the (CR) equation in [Reference Faou, Germain and Hani24, Reference Buckmaster, Germain, Hani and Shatah6], can arise.
1.2 Ideas of the proof
As Theorem 1.1 is a consequence of Theorem 1.3, we will focus on Theorem 1.3. The proof of Theorem 1.3 contains three components: ( $1$ ) a long-time well-posedness result, where we expand the solution to (NLS) into Feynman diagrams for sufficiently long time, up to a well-controlled error term; ( $2$ ) computation of $\mathbb E\left \lvert \widehat u_k(t)\right \rvert ^2 \left (k \in \mathbb Z^d_L\right )$ using this expansion, where we identify the leading terms and control the remainders; and ( $3$ ) a number-theoretic result that justifies the large box approximation, where we pass from the sums appearing in the expansion in the previous component to the integral appearing on the right-hand side of (WKE).
The main novelty of this work is in the first component, which is the hardest. The second component follows similar lines to those in [Reference Buckmaster, Germain, Hani and Shatah7]. Regarding the third component, the main novelty of this work is to complement the number-theoretic results in [Reference Buckmaster, Germain, Hani and Shatah7] (which dealt only with the generically irrational torus) by the cases of general tori (in the admissible range of time $T\ll L^2$ ). This provides an essentially full (up to $L^{\varepsilon }$ losses) understanding of the number-theoretic issues arising in wave-turbulence derivations for (NLS). Therefore, we will limit this introductory discussion to the first component.
1.2.1 The scheme and probabilistic criticality
Though technically involved, the basic idea of the long-time well-posedness argument is in fact quite simple. Starting from (NLS) with initial data of equation (1.2), we write the solution as
where $u^{(0)}=e^{-it\Delta _{\beta }}u_{\mathrm {in}}$ is the linear evolution, $u^{(n)}$ are iterated self-interactions of the linear solution $u^{(0)}$ that appear in a formal expansion of u and $\mathcal R_{N+1}$ is a sufficiently regular remainder term.
Since $u^{(0)}$ is a linear combination of independent random variables, and each $u^{(n)}$ is a multilinear combination, each of them will behave strictly better (both linearly and nonlinearly) than its deterministic analogue (i.e., with all $\eta _k=1$ ). This is due to the well-known large deviation estimates, which yield a ‘square root’ gain coming from randomness, akin to the central limit theorem (for instance, $\left \lVert u_{\mathrm {in}}\right \rVert _{L^{\infty }}$ is bounded by $L^{-d/2}\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ in the probabilistic setting, as opposed to $1\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ deterministically by Sobolev embedding, assuming compact Fourier support). This gain leads to a new notion of criticality for the problem, which can be definedFootnote 5 as the edge of the regime of $(\alpha , T)$ for which the iterate $u^{(1)}$ is better bounded than the iterate $u^{(0)}$ . It is not hard to see that $u^{(1)}$ can have size up to $O(\alpha\sqrt{T})$ (in appropriate norms), compared to the $O(1)$ size of $u^{(0)}$ (see, e.g., formula (2.25) for $n=1$ ). This justifies the notion that $T\sim T_{\mathrm {kin}}=\alpha ^{-2}$ corresponds to probabilistically critical scaling, whereas the time scales $T\ll T_{\mathrm {kin}}$ are subcritical.Footnote 6
As it happens, a certain notion of criticality might not capture all the subtleties of the problem. As we shall see, some higher-order iterates $u^{(n)}$ will not be better bounded than $u^{(n-1)}$ in the full subcritical range $T\ll \alpha ^{-2}$ we have postulated, but instead only in a subregion thereof. This is what defines our admissible blue region in Figure 1.
We should mention that the idea of using the gain from randomness goes back to Bourgain [Reference Bourgain3] (in the random-data setting) and to Da Prato and Debussche [Reference Da Prato and Debussche14] (later, in the stochastic PDE setting). They first noticed that the ansatz $u=u^{(0)}+\mathcal R$ allows one to put the remainder $\mathcal R$ in a higher regularity space than the linear term $u^{(0)}$ . This idea has since been applied to many different situations (see, e.g., [Reference Bourgain and Bulut5, Reference Burq and Tzvetkov8, Reference Colliander and Oh11, Reference Deng15, Reference Dodson, Lührmann and Mendelson21, Reference Kenig and Mendelson33, Reference Nahmod and Staffilani38]), though most of these works either involve only the first-order expansion (i.e., $N=0$ ) or involve higher-order expansions with only suboptimal bounds (e.g., [Reference Bényi, Oh and Pocovnicu2]). To the best of our knowledge, the present paper is the first work where the sharp bounds for these $u^{(j)}$ terms are obtained to arbitrarily high order (at least in the dispersive setting).
Remark 1.5. There are two main reasons why the high-order expansion (1.5) gives the sharp time of control, in contrast to previous works. The first is that we are able to obtain sharp estimates for the terms $u^{(j)}$ with arbitrarily high order, which were not known previously due to the combinatorial complexity associated with trees (see Section 1.2.2).
The second reason is more intrinsic. In higher-order versions of the original Bourgain–Da Prato–Debussche approach, it usually stops improving in regularity beyond a certain point, due to the presence of the high-low interactions (heuristically, the gain of powers of low frequency does not transform to the gain in regularity). This is a major difficulty in random-data theory, and in recent years a few methods have been developed to address it, including regularity structure [Reference Hairer29], para-controlled calculus [Reference Gubinelli, Imkeller and Perkowski28] and random averaging operators [Reference Deng, Nahmod and Yue18]. Fortunately, in the current problem this issue is absent, since the well-prepared initial data (1.2) bound the high-frequency components (where $\lvert k\rvert \sim 1$ ) and low-frequency components (where $\left \lvert k\right \rvert \sim L^{-1}$ ) uniformly, so the high-low interaction is simply controlled in the same way as the high-high interaction, allowing one to gain regularity indefinitely as the order increases.
1.2.2 Sharp estimates of Feynman trees
We start with the estimate for $u^{(n)}$ . As is standard with the cubic nonlinear Schrödinger equation, we first perform the Wick ordering by defining
Note that $M_0$ is essentially the mass which is conserved. Now w satisfies the renormalised equation
and $\left \lvert \widehat {w_k}(t)\right \rvert ^2=\left \lvert \widehat {u_k}(t)\right \rvert ^2$ . This gets rid of the worst resonant term, which would otherwise lead to a suboptimal time scale.
Let $w^{(n)}$ be the nth-order iteration of the nonlinearity in equation (1.6), corresponding to the $u^{(n)}$ in equation (1.5). Since this nonlinearity is cubic, by induction it is easy to see that $w^{(n)}$ can be written (say, in Fourier space) as a linear combination of termsFootnote 7 $\mathcal J_{\mathcal{T}\,}$ , where $\mathcal{T}\,$ runs over all ternary trees with exactly n branching nodes (we will say it has scale $\mathfrak s(\mathcal{T}\,\,)=n$ ). After some further reductions, the estimate for $\mathcal J_{\mathcal{T}\,}$ can be reduced to the estimate for terms of the form
where $\eta _k(\omega )$ is as in equation (1.2), $(k_1,\ldots ,k_{2n+1})\in \left (\mathbb {Z}_L^d\right )^{2n+1}$ , S is a suitable finite subset of $\left (\mathbb {Z}_L^d\right )^{2n+1}$ and the $(2n+1)$ subscripts correspond to the $(2n+1)$ leaves of $\mathcal{T}\,$ (see Definition 2.2 and Figure 2).Footnote 8
To estimate $\Sigma _k$ defined in formaul (1.7) we invoke the standard large deviation estimate (see Lemma 3.1), which essentially asserts that $\left \lvert \Sigma _k\right \rvert \lesssim (\#S)^{1/2}$ with overwhelming probability, provided that there is no pairing in $(k_1,\ldots ,k_{2n+1})$ , where a pairing $\left (k_i,k_j\right )$ means $k_i=k_j$ and the signs of $\eta _{k_i}$ and $\eta _{k_j}$ in formula (1.7) are opposites. Moreover, in the case of a pairing $\left (k_i,k_j\right )$ we can essentially replace $\eta _{k_i}^{\pm } \eta _{k_j}^{\pm }=\left \lvert \eta _{k_i}\right \rvert ^2\approx 1$ , so in general we can bound, with overwhelming probability,
It thus suffices to bound the number of choices for $(k_1,\ldots ,k_{2n+1})$ given the pairings, as well as the number of choices for the paired $k_j$ s given the unpaired $k_j$ s.
In the no-pairing case, such counting bounds are easy to prove, since the set S is well adapted to the tree structure of $\mathcal{T}\,$ ; what makes the counting nontrivial is the pairings, especially those between leaves that are far away or from different levels (see Figure 3, where a pairing is depicted by an extra link between the two leaves). Nevertheless, we have developed a counting algorithm that specifically deals with the given pairing structure of $\mathcal{T}\,$ and ultimately leads to sharp counting bounds and consequently sharp bounds for $\Sigma _k$ (see Proposition 3.5).
1.2.3 An $\ell ^2$ operator norm bound
In contrast to the tree terms $\mathcal J_{\mathcal{T}\,}$ , the remainder term $\mathcal R_{N+1}$ has no explicit random structure. Indeed, the only way it feels the ‘chaos’ of the initial data is through the equation it satisfies, which in integral form and spatial Fourier variables looks like
where $\mathcal J_{\sim N}$ is a sum of Feynman trees $\mathcal J_{\mathcal{T}\,}$ (already described) of scale $\mathfrak s (\mathcal{T}\,\,)\sim N$ , and $\mathcal L$ , $\mathcal Q$ and $\mathcal C$ are, respectively, linear, bilinear and trilinear operator in $\mathcal R_{N+1}$ . The main point here is that one would like to propagate the estimates on $\mathcal J_{\sim N}$ to $\mathcal R_{N+1}$ itself; this is how we make rigorous the so-called ‘propagation of chaos or quasi-Gaussianity’ claims that are often adopted in formal derivations. In another aspect, qualitative results on propagation of quasi-Gaussianity, in the form of absolute continuity of measures, have been obtained in some cases (with different settings) by exploiting almost-conservation laws (e.g., [Reference Tzvetkov48]).
Since we are bootstrapping a smallness estimate on $\mathcal R_{N+1}$ , any quadratic and cubic form of $\mathcal R_{N+1}$ will be easily bounded. It therefore suffices to propagate the bound for the term $\mathcal L(\mathcal R_{N+1})$ , which reduces to bounding the $\ell ^2\to \ell ^2$ operator norm for the linear operator $\mathcal L$ . By definition, the operator $\mathcal L$ will have the form $v\mapsto \mathcal {IW}\left (\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}, v\right )$ , where $\mathcal {I}$ is the Duhamel operator, $\mathcal {W}$ is the trilinear form coming from the cubic nonlinearity and $\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}$ are trees of scale $\leq N$ ; thus in Fourier space it can be viewed as a matrix with random coefficients. The key to obtaining the sharp estimate for $\mathcal L$ is then to exploit the cancellation coming from this randomness, and the most efficient way to do this is via the $TT^*$ method.
In fact, the idea of applying the $TT^*$ method to random matrices has already been used by Bourgain [Reference Bourgain3]. In that paper one is still far above (probabilistic) criticality, so applying the $TT^*$ method once already gives adequate control. In the present case, however, we are aiming at obtaining sharp estimates, so applying $TT^*$ once will not be sufficient.
The solution is thus to apply $TT^*$ sufficiently many times (say, $D\gg 1$ ), which leads to the analysis of the kernel of the operator $(\mathcal L\mathcal L^*)^D$ . At first sight this kernel seems to be a complicated multilinear expression which is difficult to handle; nevertheless, we make one key observation, namely that this kernel can essentially be recast in the form of formula (1.7) for some large auxiliary tree $\mathcal{T}\,=\mathcal{T}\;^D$ , which is obtained from a single root node by attaching copies of the trees $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ successively a total of $2D$ times (see Figure 4). With this observation, the arguments in the previous section then lead to sharp bounds of the kernel of $(\mathcal L\mathcal L^*)^D$ , up to some loss that is a power of L independent of D; taking the $1/(2D)$ power and choosing D sufficiently large makes this power negligible and implies the sharp bound for the operator norm of $\mathcal L$ (see Section 3.3).
1.2.4 Sharpness of estimates
We remark that the estimates we prove for $\mathcal J_{\mathcal{T}\,}$ are sharp up to some finite power of L (independent of $\mathcal{T}\,$ ). More precisely, from Proposition 2.5 we know that for any ternary tree $\mathcal{T}\,$ of scale n and possible pairing structure (see Definition 3.3), with overwhelming probability,
where $\rho $ is some quantity depending on $\alpha $ , L and T (see formula (2.24)), k is the spatial Fourier variable and $h^b$ is a time-Sobolev norm defined in equation (2.22); on the other hand, we will show that that for some particular choice of trees $\mathcal{T}\,$ of scale n and some particular choice of pairings, with high probability,
The timescale T of Theorem 1.3 is the largest that makes $\rho \ll 1$ ; thus if one wants to go beyond T in cases other than Theorem 1.1, it would be necessary to address the divergence of formula (1.9) with $\rho \gg 1$ by exploiting the cancellation between different tree terms or different pairing choices (see Section 3.4).
1.2.5 Discussions
Shortly after the completion of this paper, work of Collot and Germain [Reference Collot and Germain12] was announced that studies the same problem, but only in the rational-torus setting. In the language of this paper, their result corresponds to the validity of equation (1.3) for $L\leq t\leq L^{2-\delta }$ , under the assumption $\alpha \leq L^{-1-\delta }$ . This is a special case of Theorem 1.3, essentially corresponding to the rectangle below the horizontal line $\log _LT=2$ and to the right of the vertical line $\log _L\left (\alpha ^{-1}\right )=1$ in Figure 1. We also mention later work by the same authors [Reference Collot and Germain13], where they consider a generic nonrectangular torus (as opposed to the rectangular tori here and in [Reference Collot and Germain12]) and prove the existence of solutions (but without justifying equation (1.3)) up to time $t\leq L^{-\delta }T_{\mathrm {kin}}$ for a wider range of power laws between $\alpha $ and L.
While the present paper was being peer-reviewed, we submitted new work to arXiv [Reference Deng and Hani16], in which we provide the first full derivation of (WKE) from (NLS). Those results reach the kinetic time scale $t=\tau \cdot T_{\mathrm {kin}}$ , where $\tau $ is independent of L (compared to Theorem 1.1 here, where $\tau \leq L^{-\varepsilon }$ ), for the scaling law $\alpha \sim L^{-1}$ on generic (irrational) rectangular tori and the scaling laws $\alpha \sim L^{-\gamma }$ (where $\gamma <1$ and is close to $1$ ) on arbitrary rectangular tori.
Shortly after completing [Reference Deng and Hani16], we received a preprint of a forthcoming deep work by Staffilani and Tran [Reference Staffilani and Tran47]. It concerns a high-dimensional (on $\mathbb {T}^d$ for $d\geq 14$ ) KdV equation under a time-dependent Stratonovich stochastic forcing, which effectively randomises the phases without injecting energy into the system. The authors derive the corresponding wave kinetic equation up to the kinetic time scale, for the scaling law $\alpha \sim L^{-0}$ (i.e., first taking $L\to \infty $ and then taking $\alpha \to 0$ ). They also prove a conditional result without such forcing, where the condition is verified for some particular initial densities converging to the equilibrium state (stationary solution to the wave kinetic equation) in the limit.
1.3 Organisation of the paper
In Section 2 we explain the diagrammatic expansion of the solution into Feynman trees, and state the a priori estimates on such trees and remainder terms, which yield the long-time existence of such expansions. Section 3 is devoted to the proof of those a priori estimates. In Section 4 we prove the main theorems already mentioned, and in Section 5 we prove the necessary number-theoretic results that allow us to replace the highly oscillating Riemann sums by integrals.
1.4 Notation
Most notation will be standard. Let $z^+=z$ and $z^-=\overline {z}$ . Define $\left \lvert k\right \rvert _{\beta }$ by $\left \lvert k\right \rvert _{\beta }^2=\beta _1k_1^2+\cdots +\beta _dk_d^2$ for $k=(k_1,\ldots ,k_d)$ . The spatial Fourier series of a function $u: {\mathbb T}_L^d \to \mathbb C$ is defined on $\mathbb Z^d_L:=L^{-1}\mathbb Z^{d}$ by
The temporal Fourier transform is defined by
Let $\delta>0$ be fixed throughout the paper. Let N, s and $b>\frac {1}{2}$ be fixed, such that N and s are large enough and $b-\frac {1}{2}$ is small enough, depending on d and $\delta $ . The quantity C will denote any large absolute constant, not dependent on $\big(N,s,b-\frac {1}{2}\big)$ , and $\theta $ will denote any small positive constant, which is dependent on $\big(N,s,b-\frac {1}{2}\big)$ ; these may change from line to line. The symbols $O(\cdot )$ , $\lesssim $ and so on will have their usual meanings, with implicit constants depending on $\theta $ . Let L be large enough depending on all these implicit constants. If some statement S involving $\omega $ is true with probability $\geq 1-Ke^{-L^{\theta }}$ for some constant K (depending on $\theta $ ), then we say this statement S is L-certain.
When a function depends on many variables, we may use notations like
to denote a function f of variables $(x_i:i\in A)$ and $y_1,\ldots ,y_m$ .
2 Tree expansions and long-time existence
2.1 First reductions
Let $\widehat {u}_k(t)$ be the Fourier coefficients of $u(t)$ , as in equation (1.10). Then with $c_k(t):= e^{2\pi i\left \lvert k\right \rvert _{\beta }^2t} \widehat u_k(t)=\left (\mathcal F_{{\mathbb T}^d_L} e^{-it\Delta _{\beta }} u\right )(k)$ , we arrive at the following equation for the Fourier modes:
where $ \Omega (k_1,k_2,k_3,k) =\left \lvert k_1\right \rvert _{\beta }^2-\left \lvert k_2\right \rvert _{\beta }^2+\left \lvert k_3\right \rvert _{\beta }^2-\left \lvert k\right \rvert _{\beta }^2. $ Note that the sum can be written as
which, defining $M=\sum _{k_3} \left \lvert c_{k_3}\right \rvert ^2$ (which is conserved), allows us to write
Here and later, $\sum ^{\times }$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1-k_2+k_3=k$ and $k\not \in \{k_1,k_3\}$ . Introducing $b_k(t)=c_k(t)e^{-2i\left (L^{-d}\lambda \right )^{2}Mt}$ , we arrive at the following equation for $b_k(t)$ :
In Theorem 1.3 we will be studying the solution $u(t)$ , or equivalently the sequence $(b_k(t))_{k \in \mathbb Z^d_L}$ , on a time interval $[0,T]$ . It will be convenient, to simplify some notation later, to work on the unit time interval $[0,1]$ . For this we introduce the final ansatz
which satisfies the equation
Here we have also used the relation $\alpha =\lambda ^2L^{-d}$ . Recall the well-prepared initial data (1.2), which transform into the initial data for $a_k$ :
where $\eta _{k}(\omega )$ are the same as in equation (1.2).
2.2 The tree expansion
Let $\boldsymbol a(t) =(a_k(t))_{k \in \mathbb Z^d_L}$ and $\boldsymbol {a}_{\mathrm {in}} =\boldsymbol a(0)$ . Let $J=[0,1]$ ; we will fix a smooth compactly supported cutoff function $\chi $ such that $\chi \equiv 1$ on J. Then by equation (2.3), we know that for $t\in J$ we have
where the Duhamel term is defined by
Since we will only be studying $\boldsymbol {a}$ for $t\in J$ , from now on we will replace $\boldsymbol {a}$ by the solution to equation (2.5) for $t\in \mathbb {R}$ (the existence and uniqueness of the latter will be clear from a proof to follow). We will be analysing the temporal Fourier transform of this (extended) $\boldsymbol {a}$ , so let us first record a formula for $\mathcal {I}$ on the Fourier side:
Lemma 2.1. Let $\mathcal {I}$ be defined as in equation (2.6), and recall that $\widetilde {G}$ means the temporal Fourier transform of G; then we have
Proof. See [Reference Deng, Nahmod and Yue17].
Now define $\mathcal J_n$ recursively by
and define
By plugging in equation (2.5), we get that $\mathcal R_{N+1}$ satisfies the equation
where the relevant terms are defined as
Next we will derive a formula for the time Fourier transform of $\mathcal J_n$ ; for this we need some preparation regarding multilinear forms associated with ternary trees.
Definition 2.2.
-
1. Let $\mathcal{T}~$ be a ternary tree. We use $\mathcal {L}$ to denote the set of leaves and l their number, $\mathcal {N}=\mathcal{T}\,\backslash \mathcal L$ the set of branching nodes and n their number, and $\mathfrak {r} \in \mathcal N$ the root node. The scale of a ternary tree $\mathcal{T}\,$ is defined as $\mathfrak s(\mathcal{T}\,\,)=n$ (the number of branching nodes).Footnote 9 A tree of scale n has $l=2n+1$ leaves and a total of $3n+1$ vertices.
-
2. (Signs on a tree) For each node $\mathfrak {n}\in \mathcal {N}$ , let its children from left to right be $\mathfrak {n}_1$ , $\mathfrak {n}_2$ , $\mathfrak {n}_3$ . We fix the sign $\iota _{\mathfrak {n}}\in \{\pm \}$ as follows: first $\iota _{\mathfrak {r}}=+$ , then for any node $\mathfrak {n}\in \mathcal {N}$ , define $\iota _{\mathfrak {n}_1}=\iota _{\mathfrak {n}_3}=\iota _{\mathfrak {n}}$ and $\iota _{\mathfrak {n}_2}=-\iota _{\mathfrak {n}}$ .
-
3. (Admissible assignments) Suppose we assign to each $\mathfrak {n}\in \mathcal{T}\,$ an element $k_{\mathfrak {n}}\in \mathbb {Z}_L^d$ . We say such an assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is admissible if for any $\mathfrak {n}\in \mathcal {N}$ we have $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}-k_{\mathfrak {n}_2}+k_{\mathfrak {n}_3}$ and either $k_{\mathfrak {n}}\not \in \left \{k_{\mathfrak {n}_1},k_{\mathfrak {n}_3}\right \}$ or $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}=k_{\mathfrak {n}_2}=k_{\mathfrak {n}_3}$ . Clearly an admissible assignment is completely determined by the values of $k_{\mathfrak {l}}$ for $\mathfrak {l}\in \mathcal {L}$ . For any assignment, we denote $\Omega _{\mathfrak {n}}:=\Omega \left (k_{\mathfrak {n}_1},k_{\mathfrak {n}_2},k_{\mathfrak {n}_3},k_{\mathfrak {n}}\right )$ . Suppose we also fixFootnote 10 $d_{\mathfrak {n}}\in \{0,1\}$ for each $\mathfrak {n}\in \mathcal {N}$ ; then we can define $q_{\mathfrak {n}}$ for each $\mathfrak {n}\in \mathcal{T}\,$ inductively by
(2.16) $$ \begin{align} q_{\mathfrak{n}}=0\text{ if }\mathfrak{n}\in\mathcal L\quad\text{or}\quad q_{\mathfrak{n}}=d_{\mathfrak{n}_1}q_{\mathfrak{n}_1}-d_{\mathfrak{n}_2}q_{\mathfrak{n}_2}+d_{\mathfrak{n}_3}q_{\mathfrak{n}_3}+\Omega_{\mathfrak{n}}\text{ if }\mathfrak{n}\in\mathcal{N}.\end{align} $$
Proposition 2.3. For each ternary tree $\mathcal{T}\,$ , define $\mathcal J_{\mathcal{T}\,}$ inductively by
where $\bullet $ represents the tree with a single node and $\mathcal{T}\,_1$ , $\mathcal{T}\,_2$ , $\mathcal{T}\,_3$ are the subtrees rooted at the three children of the root node of $\mathcal{T}\,$ . Then we have
Moreover, for any $\mathcal{T}\,$ of scale $\mathfrak s(\mathcal{T}\,\,)=n$ we have the formula
where the sum is taken over all admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ such that $k_{\mathfrak {r}}=k$ , and the function $\mathcal {K}=\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ satisfies
where $q_{\mathfrak {n}}$ is defined in equation (2.16).
Proof. First, equation (2.18) follows from the definitions in equations (2.9) and (2.17) and an easy induction. We now prove formulas (2.19) and (2.20) inductively, noting also that $(a_k)_{\mathrm {in}}=\sqrt {n_{\mathrm {in}}(k)}\cdot \eta _k(\omega )$ . For $\mathcal{T}\,=\bullet $ , equation (2.19) follows from equation (2.17) with $\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {r}})=\widetilde {\chi }(\tau )$ that satisfies formula (2.20). Now suppose formulas (2.19) and (2.20) are true for smaller trees; then by formulas (2.7) and (2.17) and Lemma 2.1, up to unimportant coefficients, we can write
where $\sum ^*$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1-k_2+k_3=k$ and either $k\not \in \{k_1,k_3\}$ or $k=k_1=k_2=k_3$ , the signs $(\iota _1,\iota _2,\iota _3)=(+,-,+)$ , and $\sigma =\tau _1-\tau _2+\tau _3+T\Omega (k_1,k_2,k_3,k)$ . Now applying the induction hypothesis, we can write $\left (\widetilde {\mathcal J_{\mathcal{T}\,}}\right )_{k}(\tau )$ in the form of equation (2.19) with the function
where $\mathfrak {r}$ is the root of $\mathcal{T}\,$ with children $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ and $\mathcal{T}\,_j$ is the subtree rooted at $\mathfrak {r}_j$ .
It then suffices to prove that $\mathcal {K}_{\mathcal{T}\,}$ defined by equation (2.21) satisfies formula (2.20). By the induction hypothesis, we may fix a choice $d_{\mathfrak {n}}$ for each nonleaf node $\mathfrak {n}$ of each $\mathcal{T}\,_j$ , and let $d_{\mathfrak {r}}=d$ . Then plugging formula (2.20) into equation (2.21), we get
which upon integration in $\tau _j$ gives equation (2.20). This completes the proof.
2.3 Statement of main estimates
Define the $h^b$ space by
and similarly the $h^{s,b}$ space for $\boldsymbol a(t)=(a_k(t))_{k \in \mathbb Z^d_L}$ by
We shall estimate the solution u in an appropriately rescaled $X^{s, b}$ space, which is equivalent to estimating the sequence $\boldsymbol a(t)=\left (a_k(t)\right )_{k \in \mathbb Z^d_L}$ in the space $h^{s, b}$ . Define the quantity
By the definition of $\delta>0$ in formula (1.4), we can verify that $\alpha T^{1/2}\leq \rho \leq L^{-\delta }$ .
Proposition 2.4 Well-posedness bounds
Let $\rho $ be defined as in formula (2.24); then L-certainly, for all $1\leq n\leq 3N$ , we have
Proposition 2.4 follows from the following two bounds, which will be proved in Section 3:
Proposition 2.5 Bounds of tree terms
We have, L-certainly, that
for any ternary tree of depth n, where $0\leq n\leq 3N$ .
Proposition 2.6 An operator norm bound
We have, L-certainly, that for any trees $\mathcal{T}\,_1,\mathcal{T}\,_2$ with $\left \lvert \mathcal{T}\,_j\right \rvert =3n_j+1$ and $0\leq n_1,n_2\leq N$ , the operators
satisfy the bounds
Remark 2.7. The bound (2.29) is a result of the probabilistic subcriticality of the problem. Similar bounds are also used in recent work by the first author, Nahmod and Yue [Reference Deng, Nahmod and Yue19] to get sharp probabilistic local well-posedness of nonlinear Schrödinger equations. The proof in both cases relies on high-order $TT^*$ arguments, although in [Reference Deng, Nahmod and Yue19] one needs to use the more sophisticated tensor norms due to the different ansatz caused by the inhomogeneity of initial data.
Proof of Proposition 2.4 (assuming Propositions 2.5 and 2.6)
Assume we have already excluded an exceptional set of probability $\lesssim e^{-L^{\theta }}$ . The bound (2.25) follows directly from formulas (2.18) and (2.27); it remains to bound $\mathcal {R}_{N+1}$ . Recall that $\mathcal {R}_{N+1}$ satisfies equation (2.11), so it suffices to prove that the mapping
is a contraction mapping from the set $\mathcal {Z}=\left \{v:\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^{N}\right \}$ to itself. We will prove only that it maps $\mathcal {Z}$ into $\mathcal {Z}$ , as the contraction part follows in a similar way. Now suppose $\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^N$ ; then by formulas (2.18) and (2.27), we have
so $\left \lVert \mathcal J_{\sim N}\right \rVert _{h^{s,b}}\ll \rho ^N$ . Next we may use formula (2.29) to bound
As for the terms $\mathcal Q(v)$ and $\mathcal C(v)$ , we apply the simple bound
(which easily follows from formula (2.7)), where $\sum _{\mathrm {cyc}}$ means summing in permutations of $(u,v,w)$ . As $\alpha T\leq L^{d}$ , we conclude (also using Proposition 2.5) that
since $\rho \leq L^{-\delta }$ and $N\gg \delta ^{-1}$ . This completes the proof.
3 Proof of main estimates
In this section we prove Propositions 2.5 and 2.6.
3.1 Large deviation and basic counting estimates
We start by making some preparations, namely the large deviation and counting estimates that will be used repeatedly in the proof later.
Lemma 3.1. Let $\{\eta _k(\omega )\}$ be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle. Let $F=F(\omega )$ be defined by
where $a_{k_1\cdots k_n}$ are constants; then F can be divided into finitely many terms, and for each term there is a choice of $X=\left \{i_1,\ldots ,i_p\right \}$ and $Y=\left \{j_1,\ldots ,j_p\right \}$ , which are two disjoint subsets of $\{1,2,\ldots ,n\}$ , such that
holds with
where a pairing $\left (k_{i},k_{j}\right )$ means $\left (\iota _i+\iota _j,\iota _ik_i+\iota _jk_j\right )=0$ .
Proof. First assume $\eta _k$ is Gaussian. Then by the standard hypercontractivity estimate for an Ornstein–Uhlenbeck semigroup (see, e.g., [Reference Oh and Thomann40]), we know that formula (3.2) holds with M replaced by $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ . Now to estimate $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ , by dividing the sum (3.1) into finitely many terms and rearranging the subscripts, we may assume in a monomial of equation (3.1) that
and the $k_{j_s}$ are different for $1\leq s\leq r$ . Such a monomial has the form
where the factors for different s are independent. We may also assume $b_s=c_s$ for $1\leq s\leq q$ and $b_s\neq c_s$ for $q+1\leq s\leq r$ , and for $1\leq j\leq j_q$ we may assume $\iota _j$ has the same sign as $(-1)^j$ . Then we can further rewrite this monomial as a linear combination of
for $1\leq p\leq q$ . Therefore, $F(\omega )$ is a finite linear combination of expressions of the form
Due to independence and the fact that $\mathbb {E}\left (\left \lvert \eta \right \rvert ^{2b}-b!\right )=\mathbb {E}\left (\eta ^b\left (\overline {\eta }\right )^c\right )=0$ for a normalised Gaussian $\eta $ and $b\neq c$ , we conclude that
which is bounded by the right-hand side of equation (3.3), by choosing $X=\left \{1,3,\ldots ,j_p-1\right \}$ and $Y=\left \{2,4,\ldots ,j_p\right \}$ , as under our assumptions $(k_{2i-1},k_{2i})$ is a pairing for $2i\leq j_p$ .
Now assume $\eta _k$ is uniformly distributed on the unit circle. Let $\{g_k(\omega )\}$ be independent, identically distributed normalised Gaussians as in the first part, and consider the random variable
We can calculate
where $1\leq i\leq q$ and $1\leq j\leq n$ , and similarly for H,
The point is that we always have
In fact, in order for either side to be nonzero, for any particular k we must have
Let both be equal to m; then by independence, the factor that the $\eta _k^{\pm }$ s contribute to the expectation on the left-hand side will be $\mathbb {E}\left \lvert \eta _k\right \rvert ^{2m}=1$ , while for the right-hand side it will be $\mathbb {E}\left \lvert g_k\right \rvert ^{2m}=m!\geq 1$ .
This implies that $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )\leq \mathbb {E}\left (\left \lvert H\right \rvert ^{2q}\right )$ for any positive integer q; since formula (3.2) holds for H, we have
with an absolute constant C. This gives an upper bound for $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )$ , and by Chebyshev inequality, we deduce formula (3.2) for F.
Lemma 3.2. Let $\beta =(\beta _1,\ldots ,\beta _d)\in [1,2]^d$ and $0<T\leq L^d$ . Assume that $\beta $ is generic for $T\geq L^{2}$ . Then, uniformly in $(k,a,b,c)\in \left (\mathbb {Z}_L^d\right )^4$ and $m\in \mathbb {R}$ , the sets
satisfy the bounds
where in the first inequality of formula (3.10) we also assume $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ .
Moreover, with $\rho $ defined as in formula (2.24), we have
without any assumption on $(k,a,b)$ .
Proof. We first consider $S_3$ . Let $k-x=p$ and $k-z=q$ ; then we may write $p=\left (L^{-1}u_1,\ldots , L^{-1}u_d\right )$ and similarly for q, where each $u_i$ and $v_i$ is an integer and belongs to a fixed interval of length $O\left (L^{1+\theta }\right )$ . Moreover, from $(x,y,z)\in S_3$ we deduce that
We may assume $u_iv_i=0$ for $1\leq i\leq r$ , and $\sigma _i:=u_iv_i\neq 0$ for $r+1\leq i\leq d$ ; then the number of choices for $(u_i,v_i:1\leq i\leq r)$ is $O\left (L^{r+\theta }\right )$ . It is known (see [Reference Deng, Nahmod and Yue17, Reference Deng, Nahmod and Yue18]) that given $\sigma \neq 0$ , the number of integer pairs $(u,v)$ such that u and v each belongs to an interval of length $O\left (L^{1+\theta }\right )$ and $uv=\sigma $ is $O\left (L^{\theta }\right )$ . Therefore, if $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ , then $\#S_3$ is bounded by $O\left (L^{r+\theta }\right )$ times the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ that satisfy
Using the assumption $T\leq L^{d}$ , it suffices to show that the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ satisfying formula (3.12) is at most $O\left (1+L^{2(d-r)+\theta }T^{-1}\right )$ . This latter bound is trivial if $d-r=1$ or $L^2T^{-1}\geq 1$ , so we may assume $d-r\geq 2$ , $T\geq L^{2}$ and $\beta _i$ is generic. It is well known in Diophantine approximation (see, e.g., [Reference Cassels9]) that for generic $\beta _i$ we have
so the distance between any two points $(\sigma _i:r+1\leq i\leq d)$ and $(\sigma _i':r+1\leq i\leq d)$ satisfying formula (3.12) is at least $\left (L^2T^{-1}\right )^{-\frac {1}{d-r-1}-\theta }$ . Since all these points belong to a box which has size $O(1)$ in one direction and size $O\left (L^{2+\theta }\right )$ in other orthogonal directions, we deduce that the number of solutions to formula (3.12) is at most $1+L^{\theta } L^{2(d-r-1)}L^2T^{-1}$ , as desired.
Next, without any assumption on $(k,a,b)$ , we need to prove formula (3.11). By definition (2.24) we can check that $Q^2\geq L^{2d}\left (\min \left (T,L^2\right )\right )^{-1}$ , so it suffices to prove the first inequality of formula (3.10), assuming $T\leq L^2$ . But this again follows from formula (3.12), noting that now $\left \lvert \sigma _j\right \rvert \leq L^{2+\theta }$ is no longer true, but each $\sigma _j$ still has at most $L^{2+\theta }$ possible values.
Finally we consider $S_2$ , which is much easier. In fact, formula (3.11) follows from formula (3.10), so we only need to prove the latter. Now if $T\leq L$ , we trivially have $\#S_2\leq L^{d+\theta }$ , as y will be fixed once x is; if $T\geq L$ , then we may assume $x_d-y_d\neq 0$ if the sign $\pm $ is $-$ , and then fix the first coordinates $x_j (1\leq j\leq d-1)$ and hence $y_j (1\leq j\leq d-1)$ . Then we have that $x_d\pm y_d$ is fixed, and $x_d^2\pm y_d^2$ belongs to a fixed interval of length $O\left (T^{-1}\right )$ . Since $x_d,y_d\in L^{-1}\mathbb {Z}$ , we know that $x_d$ has at most $1+L^2T^{-1}$ choices, which implies what we want to prove.
3.2 Bounds for $\mathcal {J}_n$
In this section we prove Proposition 2.5. We will need to extend the notion of ternary trees to paired, coloured ternary trees:
Definition 3.3 Tree pairings and colourings
Let $\mathcal{T}\,$ be a ternary tree as in Definition 2.2. We will pair some of the leaves of $\mathcal{T}\,$ such that each leaf belongs to at most one pair. The two leaves in a pair are called partners of each other, and the unpaired leaves are called single. We assume $\iota _{\mathfrak {l}}+\iota _{\mathfrak {l}'}=0$ for any pair $(\mathfrak {l},\mathfrak {l}')$ . The set of single leaves is denoted $\mathcal {S}$ . The number of pairs is denoted by p, so that $\lvert {\mathcal S}\rvert =l-2p$ . Moreover, we assume that some nodes in $\mathcal {S}\cup \{\mathfrak {r}\}$ are coloured red, and let $\mathcal R$ be the set of red nodes. We shall denote $r=\lvert \mathcal R\rvert $ .
We shall use red colouring to denote that the frequency assignments to the corresponding red vertex are fixed in the counting process. We also introduce the following definition:
Definition 3.4 Strong admissibility
Suppose we fix $n_{\mathfrak {m}}\in \mathbb {Z}_L^d$ for each $\mathfrak {m}\in \mathcal R$ . An assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is called strongly admissible with respect to the given pairing, colouring and $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ if it is admissible in the sense of Definition 2.2, and
The key to the proof of Proposition 2.5 is the following combinatorial counting bound:
Proposition 3.5. Let $\mathcal{T}\,$ be a paired and coloured ternary tree such that $\mathcal R\neq \varnothing $ , and let $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ be fixed. We also fix $\sigma _{\mathfrak {n}}\in \mathbb {R}$ for each $\mathfrak {n}\in \mathcal {N}$ . Let $l=\lvert \mathcal L\rvert $ be the total number of leaves, p be the number of pairs and $r=\lvert \mathcal R\rvert $ be the number of red nodes. Then the number of strongly admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ which also satisfy
is – recalling Q defined in formula (3.11) – bounded by
Proof. We proceed by induction. The base cases directly follow from formula (3.11). Now suppose the desired bound holds for all smaller trees, and consider $\mathcal{T}\,$ . Let $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ be the children of the root node $\mathfrak {r}$ and $\mathcal{T}\,_j$ be the subtree rooted at $\mathfrak {r}_j$ . Let $l_j$ be the number of leaves in $\mathcal{T}\,_j$ , $p_j$ the number of pairs within $\mathcal{T}\,_j$ and $p_{ij}$ the number of pairings between $\mathcal{T}\,_i$ and $\mathcal{T}\,_j$ , and let $r_j=\left \lvert \mathcal {R}\cap \mathcal{T}\,_j\right \rvert $ ; then we have
Also note that $\lvert k_{\mathfrak {n}}\rvert \lesssim L^{\theta }$ for all $\mathfrak {n}\in \mathcal{T}\,$ .
The proof will be completely algorithmic, with the discussion of a lot of cases. The general strategy is to perform the following four operations, which we refer to as $\mathcal {O}_j (0\leq j\leq 3)$ , in a suitable order. Here in operation $\mathcal {O}_0$ we apply formula (3.11) to count the number of choices for the values among $\left \{k_{\mathfrak {r}},k_{\mathfrak {r}_1},k_{\mathfrak {r}_2},k_{\mathfrak {r}_2}\right \}$ that are not already fixed (this step may be trivial if three of these four vectors are already fixed –i.e., coloured – or if one of them is already fixed and $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ ). In operations $\mathcal {O}_j (1\leq j\leq 3)$ , we apply the induction hypothesis to one of the subtrees $\mathcal{T}\,_j$ and count the number of choices for $\left (k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,_j\right )$ . Let the number of choices associated with $\mathcal {O}_j (0\leq j\leq 3)$ be $M_j$ , with superscripts indicating different cases. In the whole process we may colour new nodes $\mathfrak {n}$ red if $k_{\mathfrak {n}}$ is already fixed during the previous operations, namely when $\mathfrak {n}=\mathfrak {r}$ and we have performed $\mathcal {O}_0$ before, when $\mathfrak {n}=\mathfrak {r}_j$ and we have performed $\mathcal {O}_0$ or $\mathcal {O}_j$ before or when $\mathfrak {n}$ is a leaf that has a partner in $\mathcal{T}\,_j$ and we have performed $\mathcal {O}_j$ before.
(1) Suppose $\mathfrak {r}\not \in \mathcal R$ ; then we may assume that there is a red leaf from $\mathcal{T}\,_1$ .Footnote 11 We first perform $\mathcal {O}_1$ and get a factor
Now $\mathfrak {r}_1$ is coloured red, as is any leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_1$ . There are then two cases.
(1.1) Suppose now there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is red. Then we perform $\mathcal {O}_2$ and get a factor
Now $\mathfrak {r}_2$ is coloured red, as is any leaf of $\mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are again two cases.
(1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor
then colour $\mathfrak {r}_3$ red and apply $\mathcal {O}_0$ to get a factor $M_0^{(1.1.1)}:=1$ . Thus
which is what we need.
(1.1.2) Suppose after step (1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{13}=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(1.1.2)}:=L^{\theta } Q^{1}$ (perhaps with slightly enlarged $\theta $ ; the same applies later). Now we may colour $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Thus
which is what we need.
(1.2) Now suppose that after step (1) there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=p_{12}=p_{13}=0$ . There are two cases.
(1.2.1) Suppose there is a single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ . Then we will perform $\mathcal {O}_0$ and get a factor $M_0^{(1.2.1)}:=L^{\theta } Q^{2}$ . Now we may colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Now any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ is coloured red, so we may perform $\mathcal {O}_2$ and get a factor
Thus
which is what we need.
(1.2.2) Suppose there is no single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then all leaves in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ are paired to one another, which implies that $k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ and that $\mathfrak {r}_2$ and $\mathfrak {r}_3$ have opposite signs, and hence by the admissibility condition we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . This allows us to perform $\mathcal {O}_0$ and colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red with $M_0^{(1.2.2)}:=1$ , then perform $\mathcal {O}_3$ and colour red any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , then perform $\mathcal {O}_2$ (for which we use the second bound in formula (3.15)). This leads to the factors
and thus
which is better than what we need.
(2) Now suppose $\mathfrak {r}\in \mathcal R$ ; then $r=r_1+r_2+r_3+1$ . There are two cases.
(2.1) Suppose there is one single leaf that is not red, say from $\mathcal{T}\,_1$ . There are again two cases.
(2.1.1) Suppose there is a red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say $\mathcal{T}\,_2$ . Then we perform $\mathcal {O}_2$ and get a factor
We now colour red $\mathfrak {r}_2$ and any leaf in $\mathcal{T}\,_1\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are a further two cases.
(2.1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor
Now we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.1)}:=1$ , then colour red $\mathfrak {r}_1$ as well as any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.1.2) Suppose after step (2.1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.2)}:=L^{\theta } Q^{1}$ . Then we colour $\mathfrak {r}_1$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.2) Suppose in the beginning there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=0$ . There are again two cases.
(2.1.2.1) Suppose there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is either single or paired with a leaf in $\mathcal{T}\,_1$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.1)}:=L^{\theta } Q^{2}$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
We then colour red any leaf of $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_2$ to get a factor
Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_2$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.2.2) Suppose there is no leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ that is either single or paired with a leaf in $\mathcal{T}\,_1$ ; then in the same way as in case (1.2.2), we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . Moreover, we have $p_{12}=p_{13}=0$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.2)}:=1$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
We then colour red any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ and perform $\mathcal {O}_2$ to get a factor
Finally, we perform $\mathcal {O}_1$ , again using the second part of estimate (3.15), to get a factor
Thus
which is better than what we need.
(2.2) Now suppose that in the beginning all single leaves are red – that is, $\mathcal R=\mathcal {S}\cup \{\mathfrak {r}\}$ . Then we can argue in exactly the same way as in case (2.1), except that in the last step where we perform $\mathcal {O}_1$ , it may happen that the root $\mathfrak {r}_1$ as well as all leaves of $\mathcal{T}\,_1$ are red at that time, so we lose one power of Q in view of the weaker bound from the induction hypothesis. However, since $\mathcal R=\mathcal {S}\cup \{\mathfrak {r}\}$ , we are in fact allowed to lose this power, so we can still close the inductive step, in the same way as in case (2.1). This completes the proof.
Corollary 3.6. In Proposition 3.5, suppose $\mathcal {R}=\{\mathfrak {r}\}$ . Then formula (3.15) can be improved to
Proof. In the proof of Proposition 3.5, we are now in case (2.1.2). In each subcase, either (2.1.2.1) or (2.1.2.2), we perform the operation $\mathcal {O}_0$ first. In case (2.1.2.1), by formula (3.10) – noting that the extra conditions are satisfied – we can replace the bound $M_0^{(2.1.2.1)}$ by $M_0':= L^{\theta } L^{2d}T^{-1}$ , so we get
In case (2.1.2.2) we get an improvement: we have $M\leq L^{\theta } Q^{l-p-2}$ , which also implies formula (3.16), since we can check $Q\leq L^{2d}T^{-1}\leq Q^2$ by definition.
Now we are ready to prove Proposition 2.5.
Proof of Proposition 2.5. We start with equation (2.19). Let $\lvert \mathcal{T}\,\rvert =3n+1$ . Due to the rapid decay of $\sqrt {n_{\mathrm {in}}}$ , we may assume in the summation that $\lvert k_{\mathfrak {l}}\rvert \leq L^{\theta }$ for any $\mathfrak {l}\in \mathcal L$ , and so $\lvert k\rvert \leq L^{\theta }$ also. For any fixed value of $\tau $ , we may apply Lemma 3.1 to the L-certain estimate $\left (\widetilde {\mathcal J_{\mathcal{T}\,}}\right )_k(\tau )$ . Namely, L-certainly, we have, for some choice of pairing and with colouring $\mathcal R=\{\mathfrak {r}\}$ and $n_{\mathfrak {r}}=k$ ,
where $\sum ^{**}$ represents summation under the condition that the unique admissible assignment determined by $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S})$ and $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L\backslash \mathcal {S})$ is strongly admissible. Next we would like to assume formula (3.17) for all $\tau $ , which can be done by the following trick. First, due to the decay factor in formula (2.20) and the assumption $\lvert k_{\mathfrak {l}}\rvert \leq L^{\theta }$ , we may assume $\lvert \tau \rvert \leq L^{d+\theta }$ ; moreover, choosing a large power D, we may divide the interval $\left [-L^{\theta },L^{\theta }\right ]$ into subintervals of length $L^{-D}$ and pick one point $\tau _j$ from each interval. Due to the differentiability of $\mathcal {K}_{\mathcal{T}\,}$ (see formula (2.20)), we can bound the difference
by a large negative power of L, provided $\tau $ is in the same interval as $\tau _j$ . Therefore, as long as formula (3.17) is true for each $\tau _j$ , we can assume it is true for each $\tau $ up to negligible errors. Since the number of $\tau _j$ s is at most $O\left (L^{2D}\right )$ and formula (3.17) holds L-certainly for each fixed $\tau _j$ , we conclude that, L-certainly, formula (3.17) holds for all $\tau $ .
Now, by expanding the square in formula (3.17), it suffices to bound the quantity
where $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is the unique admissible assignment determined by $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L)$ and $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L\backslash \mathcal {S})$ , and $\left (k_{\mathfrak {n}}':\mathfrak {n}\in \mathcal{T}\,\right )$ is the one determined by $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L)$ and $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L\backslash \mathcal {S}\right )$ . The conditions in the summations $\sum ^{**}$ and $\sum ^{**'}$ correspond to these two assignments being strongly admissible. By formula (2.20), we have (for some choice of $d_{\mathfrak {n}}$ )
where $q_{\mathfrak {n}}$ and $q_{\mathfrak {n}}'$ are defined from the assignments $(k_{\mathfrak {n}})$ and $\left (k_{\mathfrak {n}}'\right )$ , respectively, via equation (2.16). Thus the integral in $\tau $ gives
and it suffices to bound
Since all the qs are bounded by $L^{\theta }$ , and $T\leq L^d$ , we may fix the integer parts of each $Tq_{\mathfrak {n}}$ and $Tq_{\mathfrak {n}}'$ for each $\mathfrak {n}\in \mathcal {N}$ , and reduce the foregoing sum to a counting bound, at the price of losing a power $L^{C\left (b-\frac {1}{2}\right )}$ . Now by definition (2.16), each $q_{\mathfrak {n}}$ is a linear combination of $\Omega _{\mathfrak {n}}$ s, and conversely, each $\Omega _{\mathfrak {n}}$ is a linear combination of $q_{\mathfrak {n}}$ s. So once the integer parts of each $Tq_{\mathfrak {n}}$ and $Tq_{\mathfrak {n}}'$ are fixed, we have also fixed $\sigma _{\mathfrak {n}}\in \mathbb {R}$ and $\sigma _{\mathfrak {n}}'\in \mathbb {R}$ , such that
Therefore we are reduced to counting the number of $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S})$ , $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L\backslash \mathcal {S})$ and $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L\backslash \mathcal {S}\right )$ such that the assignments $(k_{\mathfrak {n}})$ and $\left (k_{\mathfrak {n}}'\right )$ are both strongly admissible and satisfy formula (3.18). Now let $\lvert \mathcal L\rvert =l=2n+1$ and p be the number of pairs; then $\lvert \mathcal {S}\rvert =2n+1-2p$ . First we count the number of choices for $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S})$ and $(k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L\backslash \mathcal {S})$ , where we apply Corollary 3.6 with $\mathcal R=\{\mathfrak {r}\}$ and get the factor $M:=L^{\theta } Q^{2n-p-2}L^{2d}T^{-1}$ ; then, with $k_{\mathfrak {l}}$ fixed for all $\mathfrak {l}\in \mathcal {S}$ , we count the number of choices for $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L\backslash \mathcal {S}\right )$ by applying Proposition 3.5 with $\mathcal R=\mathcal {S}\cup \{\mathfrak {r}\}$ and get the factor $M':= L^{\theta }Q^p$ . In the end we have, L-certainly,
by the definition of Q in formula (3.11), as desired.
3.3 Bounds for $\mathcal {P}_{\pm }$
In this section we prove Proposition 2.6. The proofs for both $\mathcal {P}_{\pm }$ are similar, so we consider only $\mathcal {P}_+$ .
Proof of Proposition 2.6. There are three steps.
Step 1: First reductions. We start with some simple observations. The operator $\mathcal {P}_+(v)=\mathcal {IW}\left (\mathcal J_{\mathcal{T}\,_1},\mathcal J_{\mathcal{T}\,_2},v\right )$ , where $\mathcal {I}$ and $\mathcal {W}$ are defined in formulas (2.6) and (2.7). Now in formula (2.7) we may assume $\lvert k_1\rvert ,\lvert k_2\rvert \leq L^{\theta }$ , for the same reason as in the proof of Proposition 2.5. We thus have
so instead of $h^{s,b}$ bounds we only need to consider $h^{0,b}$ bounds. Next notice that if $\mathcal {I}$ is defined by equation (2.6) and $\mathcal {I}_1$ is defined by $\mathcal {I}_1F=\chi \cdot (\mathrm {sgn}*(\chi \cdot F))$ , then we have the identity $2\mathcal {I}F(t)=\mathcal {I}_1F(t)-\chi (t)\mathcal {I}_1F(0)$ , so for $b>\frac {1}{2}$ we have $\left \lVert \mathcal {I}F\right \rVert _{h^{s,b}}\lesssim \left \lVert \mathcal {I}_1F\right \rVert _{h^{s,b}}$ . Therefore, in estimating $\mathcal {P}_+$ we may replace the operator G that appears in the formula for $\mathcal {I}$ by $\mathcal {I}_1$ . The advantage is that $\mathcal {I}_1$ has a formula
where $I_1$ is as in Lemma 2.1, so we may get rid of the $I_0$ term. From now on we will stick to the renewed definition of $\mathcal {I}$ . Next, by Proposition 2.5 we have the trivial bound
Note also that $\alpha T\leq L^{d}$ and $\rho \leq L^{-\varepsilon }$ , so by interpolation it suffices to L-certainly bound the $h^{0,b}\to h^{0,1-b}$ norm of (the renewed version of) $\mathcal {P}_+$ by $L^{\theta }\rho ^{n_1+n_2+1}$ .
Now, using Lemma 2.1 and noticing that the bound (2.8) is symmetric in $\sigma $ and $\tau $ , we have the formula
where $J=J(\tau ,\eta )$ and all its derivatives are bounded by $\left \langle \tau -\eta \right \rangle ^{-10}$ . By elementary estimates we have
and thus it suffices to L-certainly bound the $\ell ^2\to \ell ^2$ norm of the operator
uniformly in $\tau $ and $\tau '$ .
Step 2: Second reductions. At this point we apply similar arguments as in the proof of Proposition 2.5. Namely, we first restrict $\lvert \tau \rvert ,\lvert \tau '\rvert \leq L^{\theta ^{-1}}$ (otherwise we can gain a power of either $\lvert \tau \rvert ^{\frac {1}{2}\left (b-\frac {1}{2}\right )}$ or $\lvert \tau '\rvert ^{\frac {1}{2}\left (b-\frac {1}{2}\right )}$ from the extra room when applying formula (3.20), which turns into a large power of L and closes the whole estimate), and then divide this interval into subintervals of length $L^{-\theta ^{-1}}$ and apply differentiability to reduce to $O\left (L^{C\theta ^{-1}}\right )$ choices of $(\tau ,\tau ')$ . Therefore, it suffices to fix $\tau $ and $\tau '$ and L-certainly bound $\left \lVert \mathcal {X}\right \rVert _{\ell ^2\to \ell ^2}$ . Let $\tau -\tau '=\zeta $ be fixed.
Now use equation (2.19) for the $\mathcal J_{\mathcal{T}\,_j}$ factors, assuming also $\left \lvert k_{\mathfrak {l}}\right \rvert \leq L^{\theta }$ in each tree, and integrate in $(\sigma _1,\sigma _2)$ . This leads to further reduced expression for $\mathcal {X}$ , which can be described as follows. First let the tree $\mathcal{T}\,$ be defined such that its root is $\mathfrak {r}$ and three subtrees from left to right are $\mathcal{T}\,_1$ , $\mathcal{T}\,_2$ and a single node $\mathfrak {r}'$ . Then we have
where the matrix coefficients are given by
where the sum is taken over all admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ which satisfy $k_{\mathfrak {r}}=k$ , $k_{\mathfrak {r}'}=k'$ and $\left \lvert k_{\mathfrak {n}}\right \rvert \leq L^{\theta }$ for $\mathfrak {n}\not \in \{\mathfrak {r},\mathfrak {r}'\}$ , and the coefficient satisfies $\left \lvert \mathcal {K}\right \rvert \leq L^{\theta }$ and $\left \lvert \partial \mathcal {K}\right \rvert \leq L^{\theta } T$ . Moreover, we observe that $\mathcal {K}$ and $q_{\mathfrak r}$ depend on the variables $k_{\mathfrak {r}}=k$ and $k_{\mathfrak {r}'}=k'$ only through the quantity $\left \lvert k\right \rvert _{\beta }^2-\left \lvert k'\right \rvert _{\beta }^2$ .
Next we argue in the same way as in the proof of Proposition 2.5 and fix the integer parts of $Tq_{\mathfrak {n}}$ for $\mathfrak {n}\in \mathcal {N}\setminus \{\mathfrak {r}\}$ , as well as the integer part of $Tq_{\mathfrak {r}}-\zeta $ , at a cost of $(\log L)^{O(1)}$ . All these can be assumed to be $\leq L^{\theta ^{-1}}$ due to the decay $\left \langle Tq_{\mathfrak {r}}-\zeta \right \rangle ^{-5}$ and the bounds on $\tau $ and $\tau '$ . This is equivalent to fixing some real numbers $\sigma _{\mathfrak {n}}=O\left (L^{\theta ^{-1}}\right )$ and requiring the assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ to satisfy $\lvert \Omega _{\mathfrak {n}}-\sigma _{\mathfrak {n}}\rvert \leq T^{-1}$ for each $\mathfrak {n}\in \mathcal {N}$ . Let this final operator, obtained by all the previous reductions, be $\mathcal {G}$ . Schematically, the operator $\mathcal {G}$ can be viewed as ‘attaching’ two trees $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ to a single node $\mathfrak {r}'$ .
Step 3: The high-order $\mathcal {G}\mathcal {G}^*$ argument. For this, we consider the adjoint operator $\mathcal {G}^*$ . A similar argument gives a formula for $\mathcal {G}^*$ , which is associated with a tree $\mathcal{T}\,^*$ formed by attaching the two trees $\mathcal{T}\,_2$ and $\mathcal{T}\,_1$ (with $\mathcal{T}\,_2$ on the left of $\mathcal{T}\,_1$ ) to a single node $\mathfrak {r}'$ , in the same way that $\mathcal {G}$ is associated with $\mathcal{T}\,$ . Given a large positive integer D, we will consider $(\mathcal {G}\mathcal {G}^*)^{D}$ , which is associated with a tree $\mathcal{T}\,^D$ . The precise description is as follows.
First, $\mathcal{T}\,^D$ is a tree with root node $\mathfrak {r}_0=\mathfrak {r}$ , and its first two subtrees (from the left) are $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ . The third subtree has root $\mathfrak {r}_1$ , and its first two subtrees (from the left) are $\mathcal{T}\,_2$ and $\mathcal{T}\,_1$ . The third subtree has root $\mathfrak {r}_2$ , and its first two subtrees (from the left) are $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ , and so on. This process repeats and eventually stops at $\mathfrak {r}_{2D}=\mathfrak {r}'$ , which is a single node, finishing the construction of $\mathcal{T}\,^D$ . As usual, denote by $\mathcal L^D$ and $\mathcal N^D$ the set of leaves and branching nodes, respectively. Then the kernel of $(\mathcal {G}\mathcal {G}^*)^{D}$ is given by
where $\left \lvert \mathcal {K}^{(D)}\right \rvert \leq L^{\theta }$ and $\left \lvert \partial \mathcal {K}^{(D)}\right \rvert \leq L^{\theta } T$ , and the sum is taken over all admissible assignments $\left (k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,^D\right )$ that satisfy $(k_{\mathfrak {r}},k_{\mathfrak {r}'})=(k,k')$ , $\lvert k_{\mathfrak {l}}\rvert \leq L^{\theta }$ for $\mathfrak {r}'\neq \mathfrak {l}\in \mathcal L$ and $\lvert \Omega _{\mathfrak {n}}-\sigma _{\mathfrak {n}}\rvert \leq T^{-1}$ for $\mathfrak {n}\in \mathcal {N}^D$ , where $\sigma _{\mathfrak {n}}=O\left (L^{\theta ^{-1}}\right )$ are fixed. Moreover, $\mathcal {K}^{(D)}$ depends on the variables $k_{\mathfrak {r}}=k$ and $k_{\mathfrak {r}'}=k'$ only through the quantities $\left \lvert k_{\mathfrak {r}_j}\right \rvert _{\beta }^2-\left \lvert k_{\mathfrak {r}_{j+1}}\right \rvert _{\beta }^2$ for $0\leq j\leq 2D-1$ .
Now note that each $\left ((\mathcal {G}\mathcal {G}^*)^{D}\right )_{kk'}$ is an explicit multilinear Gaussian expression. Since for fixed k (or $k'$ ) the number of choices for $k'$ (or k) is $O\left (L^{d+\theta }\right )$ , by Schur’s estimate we know
So it suffices to L-certainly bound $\left \lvert \left ((\mathcal {G}\mathcal {G}^*)^{D}\right )_{kk'}\right \rvert $ uniformly in k and $k'$ . We first consider this estimate with fixed $(k,k')$ . Applying Lemma 3.1, we can fix some pairings of $\mathcal{T}\,^D$ and the set $\mathcal {S}^D$ of single leaves, and argue as in the proof of Proposition 2.5 to conclude L-certainly that
where the condition for summation, as in the proof of Proposition 2.5, is that the unique admissible assignment $\left (k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,^D\right )$ determined by $\left (k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S}^D\right )$ and $\left (k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal L^D\backslash \mathcal {S}^D\right )$ satisfies all the conditions already listed, and the same happens for $\left (k_{\mathfrak {n}}':\mathfrak {n}\in \mathcal{T}\,^D\right )$ corresponding to $\left (k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S}^D\right )$ and $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L^D\backslash \mathcal {S}^D\right )$ . We know that $\mathcal{T}\,^D$ is a tree of scale $2D(n_1+n_2+1)$ ,, and so $\left \lvert \mathcal L^D\right \rvert =4D(n_1+n_2+1)+1$ ; let the number of pairings be p, and then $\left \lvert \mathcal {S}^D\right \rvert =4D(n_1+n_2+1)-2p$ . By Proposition 3.5 we can bound the number of choicesFootnote 12 for $\left (k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S}^D\right )$ and $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L^D\backslash \mathcal {S}^D\right )$ by $M=L^{\theta } Q^{4D\left (n_1+n_2+1\right )-p}$ , and bound the number of choices for $\left (k_{\mathfrak {l}}':\mathfrak {l}\in \mathcal L^D\backslash \mathcal {S}^D\right )$ given $\left (k_{\mathfrak {l}}:\mathfrak {l}\in \mathcal {S}^D\right )$ by $M'=L^{\theta } Q^{p}$ . In the end, for any fixed $(k,k')$ , we have that L-certainly,
Finally we need to L-certainly make this bound uniform in all choices of $(k,k')$ . This is not obvious, since we impose no upper bound on $\lvert k\rvert $ and $\lvert k'\rvert $ , so the number of exceptional sets we remove in the L-certain condition could presumably be infinite. However, note that the coefficient $\boldsymbol {\mathcal {K}}$ depends on k and $k'$ only through the quantities $\left \lvert k\right \rvert _{\beta }^2-\left \lvert k_{\mathfrak {r}_{j}}\right \rvert _{\beta }^2$ . Let $\mathcal {D}=\mathcal L\backslash \{\mathfrak {r'}\}$ ; then $\lvert k_{\mathfrak {l}}\rvert \leq L^{\theta }$ for $\mathfrak {l}\in \mathcal {D}$ , and the condition for summation creates the restriction that $\left \lvert \left \lvert k\right \rvert _{\beta }^2-\left \lvert k_{\mathfrak {r}_{j}}\right \rvert _{\beta }^2\right \rvert \leq L^{\theta ^{-1}}$ . The reduction from infinitely many possibilities for k (and hence $k'$ ) to finitely many is done by invoking the following result, whose proof will be left to the end:
Claim 3.7. Let $k\in \mathbb {Z}_L^d$ , and consider the function
Then there exist finitely many functions $f_1,\ldots ,f_A$ , where $A\leq L^{C\theta ^{-1}}$ , such that for any $k\in \mathbb {Z}_L^d$ there exists $1\leq j\leq A$ such that $\left \lvert f_{(k)}-f_j\right \rvert \leq L^{-\theta ^{-1}}$ on $\mathrm {Dom}\left (f_{(k)}\right )$ .
Remark 3.8. We may view Claim 3.7 as a ‘finiteness’ or ‘compactness’ lemma. Similar results are also used in [Reference Deng, Nahmod and Yue18] and [Reference Deng, Nahmod and Yue19] for similar purposes.
Now it is not hard to see that Claim 3.7 allows us to obtain a bound of the form proved that is uniform in $(k,k')$ , after removing at most $O\left (L^{C\theta ^{-1}}\right )$ exceptional sets, each with probability $\lesssim e^{-L^{\theta }}$ . This then implies
hence
By fixing D to be a sufficiently large positive integer, we deduce the correct operator bound for $\mathcal {G}$ , and hence for $\mathcal {X}$ and $\mathcal {P}_+$ . This completes the proof of Proposition 2.6.
Proof of Claim 3.7. We will prove the result for any linear function $g(m)=x\cdot m+X$ , where $x\in \mathbb {R}^d$ and $X\in \mathbb {R}$ are arbitrary. We may also assume $m\in \mathbb {Z}^d$ instead of $\mathbb {Z}_L^d$ ; the domain $\mathrm {Dom}(g)$ will then be the set E of m such that $\lvert m\rvert \leq L^{1+\theta }$ and $\lvert g(m)\rvert \leq L^{2+\theta ^{-1}}$ .
Let the affine dimension $\dim (E)=r\leq d$ ; then E contains a maximal affine independent set $\left \{q_j:0\leq j\leq r\right \}$ . The number of choices for these $q_j$ is at most $L^{d+1}$ , so we may fix them. Let $\mathcal L$ be the primitive lattice generated by $\left \{q_j-q_0:1\leq j\leq r\right \}$ , and fix a reduced basis $\left \{\ell _j:1\leq j\leq r\right \}$ of $\mathcal L$ . For any $m\in E$ there is a unique integer vector $k=(k_1,\ldots ,k_r)\in \mathbb {Z}^r$ such that $\lvert k\rvert \lesssim L^{1+\theta }$ , $m-q_0=k_1\ell _1+\cdots +k_r\ell _r$ , and as a linear function we can write $g(m)=y\cdot k+Y$ , where $y\in \mathbb {R}^r$ and $Y=g(q_0)\in \mathbb {R}$ .
Now let the $k\in \mathbb {Z}^r$ corresponding to $m=q_j$ be $k^{\left (j\right )}$ , where $1\leq j\leq r$ ; then since $q_0\in E$ and $q_j\in E$ , we conclude that $\left \lvert y\cdot k^{\left (j\right )}\right \rvert \leq L^{3+\theta ^{-1}}$ . As the $k^{\left (j\right )}$ are linear independent integer vectors in $\mathbb {Z}^r$ with norm bounded by $L^{1+\theta }$ , we conclude that $\lvert y\rvert \leq L^{C+\theta ^{-1}}$ , and consequently $\lvert Y\rvert \leq L^{C+\theta ^{-1}}$ . We may then approximate $g(m)$ for $m\in E$ by $y_j\cdot k+Y_j$ , where $y_j$ and $Y_j$ are one of the $L^{C\theta ^{-1}}$ choices that approximate y and Y up to error $L^{-\theta ^{-1}}$ , and choose $g_{\left (j\right )}=y_j\cdot k+Y_j$ .
3.4 The worst terms
In this section we exhibit terms $\mathcal J_{\mathcal{T}}~$ that satisfy the lower bound (1.9). These are the terms corresponding to trees $\mathcal{T}~$ and pairings (see Remark 3.9) as shown in Figure 5, where $\mathcal{T}~$ is formed from a single node by successively attaching two leaf nodes, and the ‘left’ node attached at each step is paired with the ‘right’ node attached in the next step. Let the scale $\mathfrak s(\mathcal{T}\,\,)=r$ ; then $\mathcal{T}\,$ has exactly $r-1$ pairings. For simplicity we will consider the rational case $\beta _j=1$ and $T\leq L^{2-\delta }$ ; the irrational case is similar.
Here it is more convenient to work with the time variable t (instead of its Fourier dual $\tau $ ). To show formula (1.9), since $b>1/2$ , we just need to bound $(\mathcal J_{\mathcal{T}\,})_k(t)$ from below for some k and some $t\in [0,1]$ ; moreover, since $\chi \equiv 1$ on $[0,1]$ , and using the recursive definition (2.17), we can write
where (due to admissibility) the variables in the summation satisfy
and the coefficient $\mathcal {B}$ is given by
with $\Omega _j$ being the resonance factors, namely
In equation (3.23) we may replace $\left \lvert \eta _{\ell _j}\right \rvert ^2$ by 1, so the factor in the big parentheses, which we denote by $\mathcal {A}_{kxyz}$ , involves no randomness. Therefore, with high probability,
In the sum, we may fix $q\in \mathbb {Z}_L^d$ with $0<\lvert q\rvert \lesssim L^{-1}$ , which has $O(1)$ choices, and write
By Poisson summation, and noticing that $\left \lvert Tt_jq\right \rvert \lesssim L^{1-\delta }$ , we conclude that up to constants,
By making change of variables $s_j=t_{j}-t_{j+1} (1\leq j\leq r-1)$ and $s_0=t-t_1$ , $s_r=t_r$ , we can reduce to
By choosing some particular $(k,q,z)$ , we may assume $q\cdot (k-q)=q\cdot (q-z)=0$ , and if we also choose $n_{\mathrm {in}}$ such that $\widehat {n_{\mathrm {in}}}$ is positive, say $n_{\mathrm {in}}(k)=e^{-\left \lvert k\right \rvert ^2}$ , and $t=\min \left (1,LT^{-1}\right )$ , then we have
and hence, with high probability,
for any fixed r – thus formula (1.9).
Remark 3.9. Here, strictly speaking, we are further decomposing $\mathcal J_{\mathcal{T}\,}$ into the sum of terms $\mathcal J_{\mathcal{T}\,,\mathcal {P}}$ , where $\mathcal {P}$ represents the pairing structure of $\mathcal{T}\,$ . In the proof of Proposition 2.5, we are actually making the same decomposition (by identifying the set of pairings) and proving the same bound for each $\mathcal J_{\mathcal{T}\,,\mathcal {P}}$ . On the other hand, the example here shows that individual terms $\mathcal J_{\mathcal{T}\,,\mathcal {P}}$ can be very large in absolute value. Thus to get any improvement on the results of this paper, one would need to explore the subtle cancellations between the $\mathcal J_{\mathcal{T}\,,\mathcal {P}}$ terms with different $\mathcal{T}\,$ or different $\mathcal {P}$ .
4 Proof of the main theorem
In this section we prove Theorem 1.3 (which also implies Theorem 1.1). Since we may alter the value of T, in proving Theorem 1.3 we may restrict to the case $T/2\leq t\leq T$ .
First note that $\mathbb E\left \lvert \widehat u(k, t)\right \rvert ^2=\mathbb E \left \lvert a_k(s)\right \rvert ^2$ , where $s:=\frac {t}{T} \in [1/2,1]$ . By mass conservation, we have $L^{-d/2}\sum _{k \in \mathbb Z^d_L}\left \lvert a_k\right \rvert ^2=O(1)$ and hence $\left \lVert a_k\right \rVert _{\ell ^{\infty }}\lesssim L^{d/2}$ . Therefore, if we denote by $\Gamma $ the intersection of all the L-certain events in Propositions 2.4 and 2.5, we have, for $0\leq s \leq 1$ (denoting by $\mathbb E_{\Gamma } G= \mathbb E \mathbf 1_{\Gamma } G$ ),
By using Proposition 2.4, we can bound the last three terms by
As with the first term on the second line of equation (4.1), since $(\mathcal J_0)_k(s)=\chi (t)\sqrt {n_{\mathrm {in}}}\cdot \eta _k(\omega )$ , by direct calculations and similar arguments as in the proof of Proposition 2.5 we can bound, for any tree $\mathcal{T}\,$ with $\mathfrak s (\mathcal{T}\,\,)=n$ ,
where M is the quantity estimated in Proposition 3.5 (i.e., the number of strongly admissible assignments satisfying formula (3.14)), with all but one leaf of $\mathcal{T}~$ being paired, and $\mathcal {R}=\{\mathfrak {r}\}$ . By Corollary 3.6 we have
It then suffices to calculate the main term, which is the first line of equation (4.1). Up to an error of size $O\left (e^{-L^{\theta }}\right )$ , we can replace $\mathbb {E}_{\Gamma }$ by $\mathbb {E}$ ; also, we can easily show that $\mathrm {Re}\,\mathbb {E}\overline {(\mathcal J_0)_k(s)}(\mathcal J_1)_k(s)=0$ . For $\lvert s\rvert \leq 1$ , clearly $\mathbb {E}\left \lvert (\mathcal J_0)_k(s)\right \rvert ^2=n_{\mathrm {in}}$ ; as for the other two terms, namely $\mathbb {E}\left \lvert (\mathcal J_1)_k(s)\right \rvert ^2$ and $2\mathbb {E}\mathrm {Re}\overline {(\mathcal J_0)_k(s)}(\mathcal J_2)_k(s)$ , we compute as follows: Recall that $(a_{\mathrm {in}})_{k}=\sqrt {n_{\mathrm {in}}(k)}\eta _k(\omega )$ and
and therefore we have
where we used $T<L^{2d-\delta }$ for the third term and estimated the second term by $L^{2d-2+\theta }$ for general $\beta _j$ and by $L^{d+\theta }$ if $\beta _j$ are irrational (e.g., using Lemma 3.2 with $m=0$ and $T=L^2$ and $L^d$ , respectively).
A similar computation for $2\mathbb {E}\mathrm {Re}\overline {(\mathcal J_0)_k(s)}(\mathcal J_2)_k(s)$ (see [Reference Buckmaster, Germain, Hani and Shatah7]) gives
where
with $\Omega \left (\vec k\right )=\Omega (k,k_1,k_2,k_3)=\left \lvert k_1\right \rvert _{\beta }^2-\left \lvert k_2\right \rvert _{\beta }^2+\left \lvert k_3\right \rvert _{\beta }^2-\left \lvert k\right \rvert _{\beta }^2$ . Therefore, we conclude that
In the following section, we derive the asymptotic formula for the sum ${\mathscr S}_t$ – namely, we show that ${\mathscr S}_t(\phi )=\mathscr K_t(\phi )+O\left (t^{-1}L^{2d-\theta }\right )$ for some $\theta>0$ , where $\mathscr K_t$ is given by
Finally, the proof is complete by using the fact that for a smooth function f,
5 Number-theoretic results
The purpose of this section is to prove the asymptotic formula for $\mathscr {S}_t$ defined in formula (4.2). The sum $\mathscr {S}_t$ should be regarded as a Riemann sum that approximates the integral $\mathscr K_t$ in formula (4.3). However, this approximation is far from trivial, because of the highly oscillating factor $\left \lvert \frac {\sin \left (\pi t\Omega \left (\vec \xi \right )\right )}{\pi t\Omega \left (\vec \xi \right )} \right |{}^2$ , which makes the problem intimately related to the equidistribution properties of the values of the quadratic form $\Omega $ .
Theorem 5.1. Set $\phi \in {\mathcal S}\left ({\mathbb R}^d\right )$ with $d\geq 3$ . For any $\delta>0$ , there exists $\theta>0$ such that the asymptotic holds:
-
1. (General tori) For any $\beta _i\in [1, 2]^d$ and any $t< L^{2-\delta }$ ,
$$ \begin{align*} {\mathscr S}_t=\mathscr K_t +O\left(L^{2d-\theta} t^{-1}\right). \end{align*} $$ -
2. (Generic tori) For generic $\beta _i\in [1, 2]^d$ and any $t< L^{d-\delta }$ ,
$$ \begin{align*}{\mathscr S}_t=\mathscr K_t +O\left(L^{2d-\theta} t^{-1}\right). \end{align*} $$
It is not hard to see that $\mathscr K_t=O\left (L^{2d}\right )t^{-1}$ , which justifies the sufficiency of the error-term bound.
Remark 5.2. It is interesting that in the case of the rational torus for which $\beta _j=1$ , this asymptotic ceases to be true at the end point $t=L^2$ . This corresponds to $\mu =1$ in formula (5.1), whose asymptotic was studied in [Reference Faou, Germain and Hani24, Reference Buckmaster, Germain, Hani and Shatah6] and yields a logarithmic divergence when $d=2$ and a different multiplicative constant for $d\geq 3$ compared to the asymptotic in our theorem.
Proof of Theorem 5.1. The proof of part (2) is contained in [Reference Buckmaster, Germain, Hani and Shatah7], we will focus only on the first part, which is less sophisticated. To simplify the notation, we will drop the subscript t from ${\mathscr S}_t$ and $\mathscr K_t$ . We use a refinement of [Reference Buckmaster, Germain, Hani and Shatah7] which basically covers the case $t<L^{1-\delta }$ . First, observe that $\Omega \left (\vec k\right )=-2\mathcal Q (k_1-k,k_3-k)$ , where $\mathcal Q(x,y):=\sum _{j=1}^d \beta _j x_j y_j$ . Therefore, changing variables $N_1=L(k_1-k)\in \mathbb Z^d$ and $N_2=L(k_3-k)\in \mathbb Z^d$ , we write the sum ${\mathscr S}$ in the form
where $W\in {\mathcal S}\left ({\mathbb R}^{2d}\right )$ . Thus we have
Step 1: Truncating in N. We first notice that the main contribution of the sum ${\mathscr S}$ (resp., the integral $\mathscr K$ ) comes from the region $|N|\lesssim L^{1+\delta _1} \left (\text {resp., }\lvert (\xi _1, \xi _2)\rvert \lesssim L^{\delta _1}\right )$ , where $\delta _1=\frac {\delta }{100}$ . This uses the fact that W is a Schwartz function with sufficient decay. We can therefore without loss of generality include in the sum ${\mathscr S}$ (resp., the integral ${\mathcal K}$ ) a factor $\chi \left (\frac {N}{L^{1+\delta _1}}\right ) \left (\text {resp., }\chi \left (\frac {z}{L^{1+\delta _1}}\right )\right )$ , where $\chi \in C_c^{\infty }\left ({\mathbb R}^d\right )$ is 1 on the unit ball $B\left (0, \frac {1}{10}\right )$ and vanishes outside $B\left (0, \frac {2}{10}\right )$ .
Step 2: Isolating the main term. We now use the fact that the Fourier transform of g is given by the tent function $\widehat g(x)=1-\lvert x\rvert $ on the interval $[-1,1]$ and vanishes otherwise to write (using the notation $e(x):=e^{2\pi i x}$ )
where A is the contribution of $\lvert \tau \rvert \leq L^{-1-\delta _1}$ and B is the contribution of the complementary region, which could be empty if $\mu <L^{-1-\delta _1}$ , in which case we assume $B=0$ . This decomposition can be understood as the analogue of the classical minor versus major arc splitting in the circle method. For the major arc ${\mathscr S}_A$ , we use Poisson summation to replace the sum in N by an integral which will give the needed asymptotic up to acceptable errors:
where ${\mathscr S}_{A1}$ and ${\mathscr S}_{A2}$ are respectively the second and third terms in the second-to-last equality.
The remainder of the proof is to show that ${\mathscr S}_{A1}, {\mathscr S}_{A2}$ and ${\mathscr S}_{B}$ are error terms.
Step 3: Showing that ${\mathscr S}_{A1}$ and ${\mathscr S}_{A2}$ are error terms. To estimate ${\mathscr S}_{A1}$ , we use the stationary phase estimate
and the fact that the term is only nonzero if $\mu> L^{-1-\delta _1}$ to bound
For ${\mathscr S}_{A2}$ , we use nonstationary phase techniques relying on the fact that the phase function $\Phi (z)=4\tau L^2 \mathcal Q(z) -Lc\cdot z$ satisfies $\left \lvert \nabla _z \Phi (z)\right \rvert =\lvert L(4\tau L (z_2, z_1)-c)\rvert \gtrsim L\lvert c\rvert $ for $c\neq 0$ , since $\lvert z\rvert \leq \frac {L^{\delta _1}}{5}$ . Therefore, one can integrate by parts in z sufficiently many times and show that $\left \lvert {\mathscr S}_{A1}\right \rvert \ll L^{2d-\delta }t^{-1}$ as well.
Step 4: Showing that ${\mathscr S}_{B}$ is an error term. Here we assume without loss of generality that $L^{-1-\delta _1} <\mu \leq L^{-\delta }$ (otherwise ${\mathscr S}_{B}=0$ ). Therefore,
Recall that $\mathcal Q(N)=\sum _{j=1}^d\beta _j (N_1)_j(N_2)_j$ , so we perform the following change of variables:
Therefore, the sum in $(N_1)_j, (N_2)_j \in \mathbb Z^{2}$ becomes a sum
We will estimate the contribution of the first sum, and it will be obvious from the proof that the other sums are estimated similarly. Also, by symmetry, we only need to consider the sums for which $p_j, q_j \geq 0$ , which reduces us to
Let $G(s, n)=\sum _{p=0}^ne\left (s p^2\right )$ be the Gauss sum, and abusing notation, also denote by $G(s, x)=G(s, [x])$ for $x\in {\mathbb R}$ , where $[x]$ is the floor function. Then
Integrating by parts in all the variables (or equivalently, performing an Abel summation), one obtains
where ‘l.o.t.’ is lower-order terms that can be bounded is a similar or simpler way than the main term. Here and in what follows, $\partial _{x_1}\cdots \partial _{y_d}\widetilde W\left (\frac {(x,y)}{L}\right )$ is understood as $\partial _{x_1}\cdots \partial _{y_d}\left (\widetilde W\left (\frac {(x,y)}{L}\right )\right )$ .
We now recall the Gauss sum estimate for $G(s, n)$ : let $0\leq a<q\leq n$ be integers such that $(a, q)=1$ and $\left \lvert s-\frac {a}{q}\right \rvert <\frac {1}{qn}$ (for any s and n, such a pair exists by Dirichlet’s approximation theorem); then
Here $s=\tau \beta _j$ , with $\tau \in \left [L^{-1-\delta _1}, L^{-\delta }\right ]$ , $\beta _j \in [1,2]$ . This means that either $\lvert n\rvert <L^{2\delta }$ or $\frac {a}{q}\lesssim L^{-\delta }$ $(\Rightarrow q\gtrsim L^{\delta })$ , and in either case we get $\lvert G(s, n)\rvert \lesssim L^{\left (1+\delta _1-\frac {\delta }{2}\right )}$ , since $\lvert n\rvert \lesssim L^{1+\delta _1}$ (note that this argument works when $a>0$ ; if $a=0$ we have the better bound $\lvert G(s,n)\rvert \lesssim \left \lvert s\right \rvert ^{-1/2}\leq L^{2/3}$ ).
As a result, we have
This gives
Now using Hua’s lemma (compare [Reference Iwaniec and Kowalski30]), we have $\left \lVert G\left (\tau , n_j\right )\right \rVert _{L^4\left [0, 1\right ]}\lesssim n_j^{1/2+\delta _1}\lesssim L^{\left (1+\delta _1\right )\left (\frac {1}{2}+\delta _1\right )}$ , which gives
provided that $\theta <\min \left (1, \frac {(d-2)\delta }{2d}\right )$ and recalling that $\delta _1=\frac {\delta }{100}$ .
Acknowledgements
The authors would like to thank Andrea Nahmod, Sergey Nazarenko and Jalal Shatah for illuminating discussions. They also thank the anonymous referee for valuable suggestions that helped improve the exposition.
Conflict of Interest
None.
Financial support
The first author was supported by NSF grant DMS 1900251. The second author was supported by NSF grants DMS-1852749 and DMS-1654692, a Sloan Fellowship and the Simons Collaboration Grant on Wave Turbulence. The results of this work were announced on 1 November, 2019, at a Simons Collaboration Grant meeting.