Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-11-25T01:46:25.912Z Has data issue: false hasContentIssue false

On several notions of complexity of polynomial progressions

Published online by Cambridge University Press:  20 January 2022

BORYS KUCA*
Affiliation:
Department of Mathematics and Statistics, University of Jyväskylä, Survontie 9, 40500 Jyväskylä, Finland
Rights & Permissions [Opens in a new window]

Abstract

For a polynomial progression

$$ \begin{align*}(x,\; x+P_1(y),\ldots,\; x+P_{t}(y)),\end{align*} $$
we define four notions of complexity: Host–Kra complexity, Weyl complexity, true complexity and algebraic complexity. The first two describe the smallest characteristic factor of the progression, the third refers to the smallest-degree Gowers norm controlling the progression, and the fourth concerns algebraic relations between terms of the progressions. We conjecture that these four notions are equivalent, which would give a purely algebraic criterion for determining the smallest Host–Kra factor or the smallest Gowers norm controlling a given progression. We prove this conjecture for all progressions whose terms only satisfy homogeneous algebraic relations and linear combinations thereof. This family of polynomial progressions includes, but is not limited to, arithmetic progressions, progressions with linearly independent polynomials $P_1, \ldots ,\!P_t$ and progressions whose terms satisfy no quadratic relations. For progressions that satisfy only linear relations, such as
$$ \begin{align*}(x,\; x+y^2,\; x+2y^2,\; x+y^3,\; x+2y^3),\end{align*} $$
we derive several combinatorial and dynamical corollaries: first, an estimate for the count of such progressions in subsets of $\mathbb {Z}/N\mathbb {Z}$ or totally ergodic dynamical systems; second, a lower bound for multiple recurrence; and third, a popular common difference result in $\mathbb {Z}/N\mathbb {Z}$ . Lastly, we show that Weyl complexity and algebraic complexity always agree, which gives a straightforward algebraic description of Weyl complexity.

Type
Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press

1 Introduction

A polynomial $P\in \mathbb {R}[y]$ is integral if $P(\mathbb {Z})\subset \mathbb {Z}$ and $P(0)=0$ . For $t\in \mathbb {N}_+$ , an integral polynomial progression of length $t+1$ is a tuple $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ given by

$$ \begin{align*} \vec{P}(x,y) = (x, \; x+P_1(y), \ldots, \; x+P_{t}(y)) \end{align*} $$

for distinct integral polynomials $P_1, \ldots , P_{t}$ . We say, moreover, that a set $A\subset \mathbb {N}$ contains $\vec {P}(x,y)$ for some $x,y\in \mathbb {N}$ if $\vec {P}(x,y)\in A^{t+1}$ . A major result on integral polynomial progressions is the polynomial Szemerédi theorem by Bergelson and Leibman, which extends the famous theorem of Szemerédi on arithmetic progressions.

Theorem 1.1. (Polynomial Szemerédi theorem [Reference Bergelson and LeibmanBL96])

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression, and suppose that $A\subseteq \mathbb {N}_+$ is dense (this means that $\limsup \nolimits _{N\to \infty }({|A\cap [N]|}/{N})>0$ , where $[N]=\{1, \ldots , N\}$ ). Then A contains $\vec {P}(x,y)$ for some $x,y\in \mathbb {N}_+$ .

Theorem 1.1 can be deduced from the following ergodic-theoretic statement using the Furstenberg correspondence principle.

Theorem 1.2. [Reference Bergelson and LeibmanBL96, Reference Host and KraHK05a]

Let $(X,\mathcal {X},\mu , T)$ be an invertible measure-preserving dynamical system, $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. If $\mu (A)>0$ for $A\in \mathcal {X}$ , then

(1) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]}\mu(A\cap T^{P_1(n)}A \cap \cdots \cap T^{P_{t}(n)}A)> 0, \end{align} $$

where $[N]=\{1, \ldots , N\}$ and ${\mathbb{E}}_{x\in X} = ({1}/{|X|})\sum _{x\in X}$ for any set X.

To prove Theorem 1.1, one thus needs to understand limits of multiple ergodic averages of the form

(2) $$ \begin{align} \mathop{\mathbb{E}}\limits_{n\in [N]}T^{P_1(n)} f_1 \cdots T^{P_{t}(n)} f_{t} \end{align} $$

for $f_1, \ldots , f_{t}\in L^\infty (\mu )$ . By a remarkable result of Host and Kra [Reference Host and KraHK05a, Reference Host and KraHK05b], there exists a family of factors $(\mathcal {Z}_s)_{s\in \mathbb {N}}$ , henceforth called Host–Kra factors, with the property that weak or $L^2$ limits of expressions of the form (2) remain unchanged if we project any of the functions $f_i$ onto one of the factors $\mathcal {Z}_s$ for some s dependent on $\vec {P}$ and i. (The definitions of factors, Weyl systems, nilsystems, and other concepts from ergodic theory and higher-order Fourier analysis used in the introduction will be provided in subsequent sections.)

Definition 1.3. (Characteristic factors)

Let $(X,\mathcal {X},\mu ,T)$ be an invertible measure- preserving dynamical system, $t\in \mathbb {N}_+$ and ${\vec {P}\in \mathbb {R}[x,y]^{t+1}}$ be an integral polynomial progression.

Suppose that $1\leqslant i\leqslant t$ . A factor $\mathcal {Y}$ of $\mathcal {X}$ is characteristic for the $L^2$ -convergence of $\vec {P}$ at i if for all choices of $f_1, \ldots , f_{t}\in L^\infty (\mu )$ , the $L^2$ -limit of (2) is 0 whenever $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Y})=0$ .

Similarly, suppose that $0\leqslant i\leqslant t$ . A factor $\mathcal {Y}$ of $\mathcal {X}$ is characteristic for the weak convergence of $\vec {P}$ at i if for all choices of $f_0, \ldots , f_{t}\in L^\infty (\mu )$ , the weak limit of (2), that is, the expression

(3) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]}\int_X f_0 \cdot T^{P_1(n)} f_1 \cdots T^{P_{t}(n)} f_{t} \,d\mu, \end{align} $$

is 0 whenever $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Y})=0$ .

Theorem 1.4. [Reference Host and KraHK05a, Reference LeibmanLei05a]

Let $t\in \mathbb {N}_+$ . For each integral polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ , there is $s\in \mathbb {N}$ such that for all invertible ergodic systems $(X,\mathcal {X},\mu ,T)$ , the factor $\mathcal {Z}_s$ is characteristic for the $L^2$ convergence of $\vec {P}$ at i for all $0\leqslant i\leqslant t$ .

The utility of Host–Kra factors comes from the fact that they are inverse limits of nilsystems [Reference Host and KraHK05b], and so understanding (2) for arbitrary systems comes down to proving certain equidistribution results on spaces called nilmanifolds that possess rich algebraic structure. Importantly, $\mathcal {Z}_s$ is a factor of $\mathcal {Z}_{s+1}$ for each $s\in \mathbb {N}$ , hence it is natural to inquire about the smallest value of s for which the factor $\mathcal {Z}_s$ is characteristic for $\vec {P}$ at i.

Definition 1.5. (Host–Kra complexity)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ . The progression $\vec {P}$ has Host–Kra complexity s at i, denoted $\mathcal {HK}_i(\vec {P})$ , if s is the smallest natural number such that the factor $\mathcal {Z}_s$ is characteristic for the weak convergence of $\vec {P}$ at i for all invertible totally ergodic dynamical systems $(X,\mathcal {X},\mu , T)$ . We say $\vec {P}$ has Host–Kra complexity s if $\max _i \mathcal {HK}_i(\vec {P}) = s$ .

Investigating complexity has been of particular interest for a class of dynamical systems called Weyl systems, leading to another notion of complexity, a variant of which is given below.

Definition 1.6. (Weyl complexity)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ . The progression $\vec {P}$ has Weyl complexity s at i, denoted $\mathcal {W}_i(\vec {P})$ , if s is the smallest natural number such that the factor $\mathcal {Z}_s$ is characteristic for the weak convergence of $\vec {P}$ at i for all Weyl systems $(X,\mathcal {X},\mu , T)$ . We say $\vec {P}$ has Weyl complexity s if $\max _i \mathcal {W}_i(\vec {P})=s$ .

In previous works [Reference Bergelson, Leibman and LesigneBLL07, Reference FrantzikinakisFra08, Reference FrantzikinakisFra16, Reference LeibmanLei09], the aforementioned notions of complexity have been defined for a polynomial family $\mathcal {P}=\{P_1, \ldots , P_t\}$ rather than for a progression $\vec {P}$ . However, we want to extend the definitions of complexity to ‘index 0’, that is, the x term in $\vec {P}$ , which is why we prefer to define it for $\vec {P}$ rather than $\mathcal {P}$ . Similarly, complexity has previously been defined for $L^2$ convergence rather than weak convergence. However, the existence of an $L^2$ limit (Theorem 1.4) and basic functional analysis imply that weak and $L^2$ limits are identical.

Host–Kra factors are deeply related to a family of seminorms called Gowers–Host–Kra seminorms. For $s\in \mathbb {N}_+$ and $f\in L^\infty (\mu )$ , the Gowers–Host–Kra seminorm of f of degree s is denoted by ${\lvert \kern -0.25ex\lvert \kern -0.25ex\lvert f \rvert \kern -0.25ex\rvert \kern -0.25ex\rvert }_s$ and satisfies the property

(4) $$ \begin{align} {\lvert\kern-0.25ex\lvert\kern-0.25ex\lvert f \rvert\kern-0.25ex\rvert\kern-0.25ex\rvert}_{s+1} = 0 \iff \operatorname{\mathrm{\mathbb{E}}}(f\mid\mathcal{Z}_s) = 0 \end{align} $$

as well as the monotonicity property

(5) $$ \begin{align} {\lvert\kern-0.25ex\lvert\kern-0.25ex\lvert f \rvert\kern-0.25ex\rvert\kern-0.25ex\rvert}_1 \leqslant {\lvert\kern-0.25ex\lvert\kern-0.25ex\lvert f \rvert\kern-0.25ex\rvert\kern-0.25ex\rvert}_2 \leqslant {\lvert\kern-0.25ex\lvert\kern-0.25ex\lvert f \rvert\kern-0.25ex\rvert\kern-0.25ex\rvert}_3 \leqslant \cdots. \end{align} $$

Gowers–Host–Kra seminorms have natural finitary analogues. For the transformation $Tx = x+1$ on $X=\mathbb {Z}/N\mathbb {Z}$ with N prime and the uniform probability measure $\mu $ , the weak limit (3) becomes

(6) $$ \begin{align} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}}f_0(x) f_1(x+P_1(y)) \cdots f_{t}(x+P_{t}(y)). \end{align} $$

The Gowers–Host–Kra seminorm of any $f:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ is a norm (for $s>1$ ) called the Gowers norm and denoted by $U^s$ , and it takes the form

(7) $$ \begin{align} \Vert f\Vert_{U^s}&=\bigg(\mathop{\mathbb{E}}\limits_{x, h_1, \ldots, h_s\in\mathbb{Z}/N\mathbb{Z}}\prod_{w\in\{0,1\}^s} {\mathcal{C}}^{|w|}f(x+w_1 h_1 + \cdots + w_s h_s)\bigg)^{{1}/{2^s}}, \end{align} $$

where ${\mathcal {C}}: z\mapsto \overline {z}$ is the conjugation operator and $|w|=w_1+ \cdots +w_s$ . As a result, $\Vert f\Vert _{U^s}~=~0$ for some $s>1$ if and only if $\Vert f\Vert _{U^2} = 0$ if and only if $f = 0$ , and so inquiring about the smallest characteristic factor of this system in the sense of Definition 1.3 makes little sense. We can, however, ask which Gowers norm ‘controls’ $\vec {P}$ in a more finitary way, and this leads to another notion of complexity.

Definition 1.7. (True complexity)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ . The progression $\vec {P}$ has true complexity s at i, denoted $\mathcal {T}_i(\vec {P})$ , if s is the smallest natural number with the following property: for every $\epsilon>0$ , there exist $\delta>0$ and $N_0\in \mathbb {N}$ such that for all primes $N>N_0$ and all functions $f_0, \ldots , f_{t}:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ satisfying $\max _i\Vert f_i\Vert _\infty \leqslant 1$ , we have

$$ \begin{align*} \bigg|\mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}}f_0(x)f_1(x+P_1(y)) \cdots f_{t}(x+P_{t}(y))\bigg| < \epsilon \end{align*} $$

whenever $\Vert f_i\Vert _{U^{s+1}}<\delta $ . We say $\vec {P}$ has true complexity s if $\max _i \mathcal {T}_i(\vec {P})=s$ .

We have so far defined three notions of complexity: Host–Kra, Weyl and true complexity. These are all defined in terms of ergodic theory or higher-order Fourier analysis and have to do with ‘controlling’ expressions like (2) and (6) by characteristic factors, Gowers–Host–Kra seminorms and Gowers norms. We shall now introduce one more notion, defined purely in terms of algebraic properties of polynomial progressions, and conjecture that all four concepts of complexity are in fact the same.

Definition 1.8. (Algebraic relations and algebraic complexity)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. An algebraic relation of degree $(j_0, \ldots , j_{t})$ satisfied by $\vec {P}$ is a tuple $(Q_0, \ldots , Q_{t})\in \mathbb {R}[u]^{t+1}$ such that

(8) $$ \begin{align} Q_0(x)+ Q_1(x+P_1(y)) + \cdots + Q_{t}(P_{t}(y)) = 0, \end{align} $$

where $\deg Q_i = j_i$ for each $0\leqslant i\leqslant t$ . The progression $\vec {P}$ has algebraic complexity s at i for some $0\leqslant i\leqslant t$ , denoted $\mathcal {A}_i(\vec {P})$ , if s is the smallest natural number such that for any algebraic relation $(Q_0, \ldots , Q_{t})$ satisfied by $\vec {P}$ , the degree of $Q_i$ is at most s. It has algebraic complexity s if $\max _i \mathcal {A}_i(\vec {P}) = s$ .

Conjecture 1.9. (The four notions of complexity are the same)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ . Then

$$ \begin{align*} \mathcal{HK}_i(\vec{P}) = \mathcal{W}_i(\vec{P}) = \mathcal{T}_i(\vec{P}) = \mathcal{A}_i(\vec{P})\leqslant t-1. \end{align*} $$

The heuristic for Conjecture 1.9 is as follows: evaluating expressions like (3) and (6) comes down to understanding the distribution of certain polynomial sequences on nilmanifolds, and the only obstructions to equidistribution come from algebraic relations of the form (8).

Several substatements of Conjecture 1.9, such as the equivalence of Weyl and Host–Kra complexity and the upper bound on complexities, have previously been conjectured in [Reference Bergelson, Leibman and LesigneBLL07, Reference FrantzikinakisFra08, Reference FrantzikinakisFra16, Reference LeibmanLei09]. Similarly, the equivalence of true and algebraic complexity has been studied and proved for linear forms [Reference Gowers and WolfGW10, Reference Gowers and WolfGW11a, Reference Gowers and WolfGW11b, Reference Gowers and WolfGW11c] as well as certain subclasses of polynomial progressions [Reference KucaKuc21a, Reference KucaKuc21b, Reference PelusePel19]. However, we have not seen the full statement of Conjecture 1.9 anywhere in the literature. In particular, we have not found a conjecture relating Host–Kra and Weyl complexity to algebraic complexity, even though the aforementioned papers researching the topic mention that algebraic relations form a source of obstructions preventing a progression from having a characteristic small-degree Host–Kra factor.

Before we state our main result, we have to distinguish between two large families of progressions.

Definition 1.10. (Homogeneous and inhomogeneous relations and progressions)

Let $t\in \mathbb {N}_+$ and ${\vec {P}\in \mathbb {R}[x,y]^{t+1}}$ be an integral polynomial progression. An algebraic relation $(Q_0, \ldots , Q_{t})\in \mathbb {R}[u]^{t+1}$ is homogeneous of degree d if it is of the form

$$ \begin{align*} (Q_0(u), \ldots, Q_{t}(u)) = (a_0 u^d, \ldots, a_{t} u^{d}) \end{align*} $$

for some $a_0, \ldots , a_{t}\in \mathbb {R}$ (some but not all of which may be zero), and inhomogeneous otherwise. The progression $\vec {P}$ is homogeneous if all the algebraic relations that it satisfies are linear combinations of its homogeneous algebraic relations, and it is called inhomogeneous otherwise.

An example of a homogeneous progression is $(x, \; x+y,\; x+2y,\; x+y^3)$ , which only satisfies a homogeneous relation

(9) $$ \begin{align} x - 2(x+y) + (x+2y) = 0. \end{align} $$

Other examples include arithmetic progressions, progressions with $P_1, \ldots , P_t$ being linearly independent such as $(x,\; x+y,\; x+y^2)$ , or progressions whose terms satisfy no quadratic relations, such as $(x,\; x+y^2,\; x+2y^2,\; x+y^3,\; x+2y^3)$ . By contrast, the progression $(x, \; x+y,\; x+2y,\; x+y^2)$ is inhomogeneous because it satisfies both (9) and the inhomogeneous relation

(10) $$ \begin{align} x^2 + 2x - 2(x+y)^2 + (x+2y)^2 - 2(x+y^2) = 0 \end{align} $$

which cannot be broken down into a sum of homogeneous relations. These two progressions will accompany us as running examples throughout the paper.

Our main result is the following theorem.

Theorem 1.11. (Conjecture 1.9 holds for homogeneous progressions)

Let $t\in \mathbb {N}_+ $ . If $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is a homogeneous polynomial progression, then it satisfies Conjecture 1.9.

Having defined Host–Kra complexity using totally ergodic systems, we would like to extend our results to ergodic systems. We have, however, encountered an algebraic obstacle in doing so that prevents us from performing this generalization for all homogeneous progressions. We introduce a subfamily of homogeneous polynomial progressions for which this extension is possible, borrowing the terminology of Frantzikinakis from [Reference FrantzikinakisFra08].

Definition 1.12. (Eligible progressions)

A homogeneous polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is eligible if for every $r\in \mathbb {N}_+$ and every $0\leq j\leq r-1$ , the family

$$ \begin{align*} \vec{\tilde{P}}(x,y) = (x,\; x+\tilde{P}_{1,j}(y), \ldots, x+\tilde{P}_{t,j}(y)), \end{align*} $$

where $\tilde {P}_{i,j}(y) = ({P_i(r(y-1)+j) - P_i(j)})/{r}$ , is homogeneous, and $\mathcal {A}_i(\vec {P}) = \mathcal {A}_i(\vec {\tilde {P}})$ for every $0\leq i \leq t$ .

The condition in Definition 1.12 may seem artificial at first glance, but this turns out to be the condition that we need to pass from totally ergodic to ergodic systems. While we believe that all homogeneous progressions satisfy this condition, we have not been able to prove this.

We now state the corollary that gives us the smallest characteristic Host–Kra factor for eligible progressions on ergodic systems. The main difference is that if a system has complexity 0, then the $\mathcal {Z}_0$ factor has to be replaced by the rational Kronecker factor $\mathcal {K}_{\mathrm {rat}}$ .

Corollary 1.13. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an eligible homogeneous polynomial progression, and suppose that $\mathcal {A}_i(\vec {P}) = s$ for some $0\leqslant i\leqslant t$ and $s\in \mathbb {N}$ . For all invertible ergodic dynamical systems $(X, \mathcal {X}, \mu , T)$ , the factor $\mathcal {Z}_s$ is characteristic for the weak and $L^2$ convergence of $\vec {P}$ at i if $s>0$ , and $\mathcal {K}_{\mathrm {rat}}$ is characteristic for the weak and $L^2$ convergence of $\vec {P}$ at i if $s=0$ .

Since all polynomial progressions of algebraic complexity at most $1$ are homogeneous and eligible, the next corollary follows.

Corollary 1.14. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be polynomial progression of algebraic complexity at most $1$ . For all invertible ergodic dynamical systems $(X, \mathcal {X}, \mu , T)$ , the factor $\mathcal {Z}_1$ is characteristic for the weak and $L^2$ convergence of $\vec {P}$ at i if $\mathcal {A}_i(\vec {P}) = 1$ , and $\mathcal {K}_{\mathrm {rat}}$ is characteristic for the weak and $L^2$ convergence of $\vec {P}$ at i if $\mathcal {A}_i(\vec {P}) = 0$ .

Theorem 1.11 as well as Corollaries 1.13 and 1.14 can be viewed as extensions of [Reference Bergelson, Leibman and LesigneBLL07, Reference Frantzikinakis and KraFK05, Reference Frantzikinakis and KraFK06, Reference FrantzikinakisFra08, Reference Host and KraHK05a, Reference Host and KraHK05b, Reference LeibmanLei09], which find characteristic factors for linear configurations, linearly independent polynomials, progressions of length 4, examine Weyl complexity for arbitrary integral polynomial progression, and give an upper bound for Host–Kra complexity for general integral progressions. Theorem 1.11 also partly extends [Reference AltmanAlt21, Reference Green, Tao, Bárány, Solymosi and SágiGT10, Reference Gowers and WolfGW10, Reference Gowers and WolfGW11a, Reference Gowers and WolfGW11b, Reference Gowers and WolfGW11c, Reference KucaKuc21a, Reference KucaKuc21b, Reference MannersMan18, Reference MannersMan21, Reference PelusePel19], which among other things determine true complexity for certain families of linear forms and integral polynomial progressions.

In particular, we extend our earlier work from [Reference KucaKuc21b]. In that paper, we prove equidistribution results on nilmanifolds for progressions of the form $(x, \; x+Q(y),\; x+R(y),\; x+Q(y)+R(y))$ with $\deg Q < \deg R$ , or $(x,\; x+Q(y),\; x+2Q(y),\; x+R(y),\; x+2R(y))$ with $\deg Q < (\deg R)/2$ , both of which are homogeneous. These equidistribution results follow from inducting on the filtration of a certain nilmanifold associated with the progression; the induction scheme involved is quite sensitive to the progression in question. Here, we achieve a much more general equidistribution result (part (i) of Theorem 1.17) by obtaining a solid understanding of the algebra behind homogeneous progressions and introducing a more flexible induction scheme.

From the fact that all progressions of algebraic complexity 1 are homogeneous and eligible, we deduce the following counting result.

Corollary 1.15. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression of algebraic complexity at most $1$ . Suppose that $Q_1, \ldots , Q_d\in \mathbb {R}[y]$ are integral polynomials such that $P_i(y) = \sum _{j=1}^d a_{ij} Q_j(y)$ for $a_{ij}\in \mathbb {Z}$ for each $0\leqslant i\leqslant t$ and $1\leqslant j\leqslant d$ . Let $L_i(y_1, \ldots y_d) = \sum _{j=1}^d a_{ij} y_j$ . Then the following statements are true.

  1. (i) For any $f_0, \ldots , f_t:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ with $\max _i \Vert f_i\Vert _\infty \leqslant 1$ , we have

    $$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} \prod_{i=0}^t f_i(x+P_i(y)) = \mathop{\mathbb{E}}\limits_{x,y_1, \ldots, y_d\in\mathbb{Z}/N\mathbb{Z}} \prod_{i=0}^t f_i(x+L_i(y_1, \ldots, y_d)) + o(1), \end{align*} $$
    where the error term $o(1)$ is taken as $N\to \infty $ over primes and does not depend on the choice of $f_0, \ldots , f_t$ .
  2. (ii) For any invertible totally ergodic dynamical system $(X, \mathcal {X}, \mu , T)$ and $f_0, \ldots , f_t\in L^\infty (\mu )$ , we have

    $$ \begin{align*} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]} \int_X \prod_{i=0}^t T^{P_i(n)} f_i \,d\mu= \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n_1, \ldots, n_d\in [N]} \int_X \prod_{i=0}^t T^{L_i(n_1, \ldots, n_d)} f_i \,d\mu. \end{align*} $$

We shall illustrate Corollary 1.15 for the specific example of

$$ \begin{align*}\vec{P}(x,y) = (x,\; x+y^2,\; x+2 y^2,\; x + y^3,\; x + 2 y^3).\end{align*} $$

Taking $Q_1(y) = y^2$ and $Q_2(y) = y^3$ as in the statement of Corollary 1.15, we let $\vec {L}(x,y_1, y_2) = (x,\; x+y_1,\; x+2 y_1,\; x+y_2,\; x + 2 y_2)$ . For any $A\subset \mathbb {Z}/N\mathbb {Z}$ , we then have

$$ \begin{align*} &|\{(x,y)\in(\mathbb{Z}/N\mathbb{Z})^2: (x,\; x+y^2,\; x+2 y^2,\; x + y^3,\; x + 2 y^3)\in A^5\}|\\ &\quad= |\{(x, y_1, y_2)\in (\mathbb{Z}/N\mathbb{Z})^3: (x,\; x+y_1,\; x+2 y_1,\; x+y_2,\; x + 2 y_2)\in A^5 \}|/ \\ &\qquad N + o(N^2) \end{align*} $$

upon setting $f_0 = \cdots = f_t = 1_A$ . If $(X,\mathcal {X}, \mu , T)$ is a totally ergodic system and $A\in \mathcal {X}$ , then we similarly obtain that

$$ \begin{align*} &\lim_{N\to\infty} \mathop{\mathbb{E}}\limits_{n\in[N]}\mu(A\cap T^{n^2}A\cap T^{2n^2}A\cap T^{n^3}A\cap T^{2n^3}A)\\ &\quad=\lim_{N\to\infty} \mathop{\mathbb{E}}\limits_{n,m\in [N]}\mu(A\cap T^{n}A\cap T^{2n}A\cap T^{m}A\cap T^{2m}A). \end{align*} $$

For progressions of algebraic complexity 1, we also prove the following result, which generalizes [Reference FrantzikinakisFra08, Theorem C], [Reference Green, Tao, Bárány, Solymosi and SágiGT10, Theorem 1.12], and results from [Reference Bergelson, Host and KraBHK05]. In additive combinatorics, problems of this type are known as finding popular common differences; in ergodic theory, one speaks of establishing lower bounds for multiple recurrence.

Theorem 1.16. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression of algebraic complexity at most $1$ , with the following property: there exist linearly independent integral polynomials $Q_1, \ldots , Q_k$ such that

(11) $$ \begin{align} \{a_1 Q_1 + \cdots + a_k Q_k:\; a_1, \ldots, a_k\in\mathbb{Z}\} = \{b_1 P_1 + \cdots + b_t P_t:\; b_1, \ldots, b_t\in\mathbb{Z}\}. \end{align} $$

Then the following statements are true.

  1. (i) Let $(X, \mathcal {X}, \mu , T)$ be an ergodic invertible measure-preserving system and $A\in \mathcal {X}$ . Suppose that $\mu (A)>0$ . Then for every $\epsilon>0$ , the set

    $$ \begin{align*} \{ n\in\mathbb{N}: \mu(A\cap T^{P_1(n)} A \cap \cdots \cap T^{P_t(n)}A)> \mu(A)^{t+1} - \epsilon\} \end{align*} $$
    is syndetic, that is, it has bounded gaps.
  2. (ii) Suppose that $A\subset \mathbb {N}$ has upper density $\alpha> 0$ . Then for every $\epsilon> 0$ , the set

    $$ \begin{align*} \{ n\in\mathbb{N}: \mu(A\cap (A + P_1(n)) \cap \cdots \cap (A + P_t(n)))> \alpha^{t+1} - \epsilon\} \end{align*} $$
    is syndetic.
  3. (iii) For any $\alpha , \epsilon> 0$ and prime N, and any subset $A\subset \mathbb {Z}/N\mathbb {Z}$ of size $|A|\geqslant \alpha N$ , we have

    $$ \begin{align*} |\{ n\in\mathbb{Z}/N\mathbb{Z}: |A\cap (A + P_1(n)) \cap \cdots \cap (A + P_t(n))|> (\alpha^{t+1} - \epsilon)N\}|\gg_{\alpha, \epsilon} N. \end{align*} $$

The definition of homogeneity (Definition 1.10) is equivalent to a certain linear algebraic property that will be described in detail in §4; this property makes it possible to explicitly describe closures of orbits of nilsequences evaluated at terms of homogeneous polynomial progressions, from which we deduce Theorem 1.11. Homogeneous polynomial progressions are, moreover, the largest family of integral polynomial progressions for which such an explicit description is possible, and even the simplest examples of inhomogeneous progressions lead to complications absent in the homogeneous case. The following result makes this precise. As with all other results in this section, all the concepts in Theorem 1.17 are explained in subsequent sections.

Theorem 1.17. (Dichotomy between homogeneous and inhomogeneous progressions)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Suppose that G is a connected, simply-connected, nilpotent Lie group with a rational filtration $G_{\bullet }$ and $\Gamma $ is a cocompact lattice. There exists a subnilmanifold $G^P/\Gamma ^P$ of $G^{t+1}/\Gamma ^{t+1}$ with the following property.

  1. (i) If $\vec {P}$ is homogeneous, then for every irrational polynomial sequence $g:\mathbb {Z}\to G$ adapted to $G_{\bullet }$ , the sequence

    $$ \begin{align*} g^P(x,y) = (g(x),\; g(x+P_1(y)),\ldots,\; g(x+P_t(y))) \end{align*} $$
    is equidistributed on $G^P/\Gamma ^P$ .
  2. (ii) If $\vec {P}$ is inhomogeneous, then for every irrational polynomial sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ , the closure of $g^P$ is a union of finitely many translates of a subnilmanifold of $G^P/\Gamma ^P$ . Moreover, for every $\vec {P}$ , we can find a filtered nilmanifold $G/\Gamma $ and an irrational polynomial sequence $g:\mathbb {Z}\to G$ such that $g^P$ is equidistributed on a proper subnilmanifold of $G^P/\Gamma ^P$ .

While we have not been able to prove Conjecture 1.9 in full for inhomogeneous progressions, we are able to say a bit more about the relationship between various notions of complexity in the general case.

Theorem 1.18. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ . Then

$$ \begin{align*} \mathcal{W}_i(\vec{P}) = \mathcal{A}_i(\vec{P})\leqslant \min(\mathcal{T}_i(\vec{P}), \mathcal{HK}_i(\vec{P})). \end{align*} $$

Of the various statements made in Theorem 1.18, the fact that Host–Kra complexity bounds Weyl complexity is a simple consequence of definitions and will be explained in §11. Similarly, the fact that algebraic complexity is bounded from above by true complexity has been shown in [Reference KucaKuc21b, Theorem 1.13]. It is the equivalence of Weyl and algebraic complexities that is a new statement here.

1.1 Outline of the paper

We start the paper by introducing basic ergodic-theoretic definitions and results concerning nilsystems in §2, and we explain why analysing expressions like (3) comes down to answering equidistribution questions on nilmanifolds. We then show in §3 that in studying equidistribution on nilmanifolds, we can restrict ourselves to nilmanifolds that are quotients of connected groups at the expense of replacing a linear sequence by a polynomial one.

Section 4 develops a notation and basic theory for certain vector spaces associated with polynomial progressions, and it explains key differences between homogeneous and inhomogeneous progressions. In particular, it contains the proof of the upper bound on algebraic complexity for homogeneous progressions from Theorem 1.11. Definitions introduced in this section allow us to state the infinitary version of an equidistribution result for homogeneous polynomial progressions on nilmanifolds (Theorem 5.3) in §5, from which we deduce that for homogeneous progressions, Host–Kra complexity is bounded from above by algebraic complexity (Corollary 5.4). We further use Theorem 5.3 to deduce Corollaries 1.13, 1.14 and 1.15(ii).

In §6 we introduce finitary analogues of tools from §2. These are needed in §7, in which we show that proving the equivalence of true and algebraic complexity for homogeneous progression comes down to proving Theorem 6.7, a finitary version of Theorem 5.3. We also explain in §7 how to prove Corollary 1.15(i). Theorem 6.7, the main technical part of this paper, is derived in §8. Unfortunately, Theorem 6.7 fails for inhomogeneous progressions, as explained in §9. In §10 we propose a method to handle inhomogeneous progressions. While we succeed in proving an analogue of Theorem 5.3 for the inhomogeneous progression $(x, \; x+y,\; x+2y,\; x+y^2)$ in Proposition 10.1, we have been unable to extend this construction to all inhomogeneous progressions. Subsequently, we show in §11 that Weyl and algebraic complexity are always equal, which is the main statement of Theorem 1.18. We conclude the paper by proving Theorem 1.16 in §12.

2 Infinitary nilmanifold theory

2.1 Basic definitions from ergodic theory

Let $(X,\mathcal {X},\mu ,T)$ be an invertible measure-preserving dynamical system (henceforth, we shall simply call it a system). The background in ergodic theory that we need can be found in [Reference Host and KraHK05b, Reference Host and KraHK18], among others; here, we only reiterate the most important definitions.

Definition 2.1. A factor of a system $(X, \mathcal {X}, \mu , T)$ can be defined in three equivalent ways:

  1. (i) it is a T-invariant sub- $\sigma $ -algebra of $\mathcal {X}$ ;

  2. (ii) it is a system $(Y,\mathcal {Y},\nu ,S)$ together with a factor map $\pi : X'\to Y'$ , that is, a measurable map defined for a measurable T-invariant set $X'$ of full measure, satisfying $S\circ \pi = \pi \circ T$ on $X'$ and $\mu \circ \pi ^{-1} = \nu $ ;

  3. (iii) it is a T-invariant subalgebra of $L^\infty (\mu )$ .

For $r\in \mathbb {N}$ , we let $\mathcal {K}_r$ be the factor spanned by all $T^r$ -invariant functions in $L^\infty (\mu )$ . In particular, $\mathcal {K}_1 = \mathcal {I}$ is the factor spanned by T-invariant functions, and the rational Kronecker factor $\mathcal {K}_{\mathrm {rat}} = \bigvee \nolimits _{r\in \mathbb {N}}\mathcal {K}_r$ is the factor spanned by all the functions in $L^\infty (\mu )$ that are $T^r$ -invariant for some $r\in \mathbb {N}$ . A system is ergodic if $\mathcal {K}_1=\mathcal {I}$ is the trivial factor spanned by constant functions, and it is totally ergodic if $\mathcal {K}_{\mathrm {rat}}$ is the trivial factor.

Of particular interest to us is a sequence of factors $(\mathcal {Z}_s)_{s\in \mathbb {N}}$ defined in [Reference Host and KraHK05b], which we refer to as Host–Kra factors. In accordance with Definition 2.1, we shall sometimes think of $\mathcal {Z}_s$ as a sub- $\sigma $ -algebra of $\mathcal {X}$ , and at other times we will consider a factor map $\pi _s: X\to ~Z_s$ and a factor $(Z_s, \mathcal {Z}_s, \lambda , S)$ of $(X, \mathcal {X}, \mu , T)$ . If we concurrently talk about Host–Kra factors of two distinct spaces X and Y, we may write $Z_s(X)$ and $Z_s(Y)$ to mean Host–Kra factors of X and Y, respectively. We do not explicitly use the definition of Host–Kra factors anywhere in the paper, and so we leave the interested reader to look it up in [Reference Host and KraHK05b, Reference Host and KraHK18]. Instead, we rely on two properties of this family of factors that concern their utility and structure, respectively. First, these factors are characteristic for the convergence of polynomial progressions, as proved in Theorem 1.4. Rephrasing Theorem 1.4 in terms of Definition 1.5, we can say that each integral polynomial progression has a finite Host–Kra complexity. Second, each factor $\mathcal {Z}_s$ is an inverse limit (the system $(X,\mathcal {X},\mu ,T)$ is an inverse limit of a sequence of factors $(X,\mathcal {X}_i,\mu ,T)$ if the $\mathcal {X}_i$ form an increasing sequence of factors of $\mathcal {X}$ such that $\mathcal {X} = \bigvee \nolimits _{i\in \mathbb {N}}\mathcal {X}_i$ up to sets of measure zero) of s-step nilsystems, which are objects of primary importance to us.

2.2 Nilsystems

Let G be a Lie group with connected component $G^0$ and identity $1$ . A filtration on G of degree s is a chain of subgroups

$$ \begin{align*} G = G_0 = G_1 \geqslant G_2 \geqslant \cdots \geqslant G_s \geqslant G_{s+1} = G_{s+2} = \cdots = 1 \end{align*} $$

satisfying $[G_i, G_j]\leqslant G_{i+j}$ for each $i,j\in \mathbb {N}$ . We denote it by $G_{\bullet } = (G_i)_{i=0}^\infty $ . A natural example of filtration is the lower central series, given by $G_{k+1} = [G, G_k]$ for each $k>1$ , where the commutator of two elements $a,b\in G$ is defined as $[a,b]=a^{-1}b^{-1}ab$ , and $[A,B]$ is the subgroup of G generated by all the commutators $[a,b]$ with $a\in A, b\in B$ . The group G is s-step nilpotent if $G_{s+1} = 1$ , where $G_{s+1}$ is the sth element of the lower central series of G. The only zero-step nilpotent group is the trivial group, and one-step nilpotent groups are precisely abelian groups.

For the rest of the paper, we let G be a nilpotent Lie group and $\Gamma \leqslant G$ be a cocompact lattice. We call the quotient $X=G/\Gamma $ a nilmanifold. The group G acts on X by left translation, and for each $a\in G$ , we call the map $T_a(g\Gamma ) = (ag)\Gamma $ a nilrotation. Setting ${\mathcal {G}}/\Gamma $ to be the Borel $\sigma $ -algebra of X and $\nu $ to be the Haar measure with respect to left translation, we call the system $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ a nilsystem.

A subgroup $H\leqslant G$ is rational if $H/(H\cap \Gamma )$ is closed in $G/\Gamma $ . A filtration $G_{\bullet }$ is rational if $G_i$ is a rational subgroup for each $i\in \mathbb {N}$ . We shall assume throughout the paper that each filtration that we discuss is rational.

In the case when $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ is an ergodic nilsystem, which will always be our case anyway, we can make two simplifying assumptions about the group G. By passing to the universal cover, we assume that G is simply connected. Replacing the nilsystem with several simpler nilsystems, we further assume that G is spanned by $G^0$ and a. These assumptions, justified in [Reference Host and KraHK18, Ch. 11], hold for the rest of the paper.

We also denote $\Gamma _i = G_i\cap \Gamma $ and $\Gamma ^0 = G^0\cap \Gamma $ . The rationality of $G_i$ in G means that $\Gamma _i$ is cocompact in $G_i$ .

Proposition 2.2. (Conditions for total ergodicity of nilsystems [Reference Host and KraHK18, Corollaries 7 and 8])

Let $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ be an ergodic nilsystem. There exists $r\in \mathbb {N}_+$ such that ${T_a^j(G^0/\Gamma ^0)}$ is totally ergodic with respect to $T_a^r$ for all $0\leqslant j<r$ .

Moreover, the following are equivalent:

  1. (i) $T_a$ is totally ergodic;

  2. (ii) $G/\Gamma $ is connected;

  3. (iii) $G=G^0\Gamma $ .

Nilsystems allow a particularly simple description of factors. If $G_{\bullet }$ is the lower central series filtration, then

(12) $$ \begin{align}Z_s = \frac{G}{G_{s+1}\Gamma} \end{align} $$

for all $s\in \mathbb {N}_+$ (see [Reference Host and KraHK18, Ch. 11]). For $s=0$ , we have $Z_0 = G/(G^0\Gamma )\cong (\mathbb {Z}/r\mathbb {Z})$ , where r is the smallest positive integer for which $a^r\in G^0$ . It follows from Proposition 2.2 that $Z_0$ is trivial if and only if the nilsystem is totally ergodic.

Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. By Theorem 1.4, there exists $s\in \mathbb {N}$ such that for every ergodic system $(X, \mathcal {X}, \mu , T)$ and all choices of $f_0, \ldots , f_t\in L^\infty (\mu )$ , we have

(13) $$ \begin{align} &\lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]}\int_X f_0 \cdot T^{P_1(n)} f_1 \cdots T^{P_{t}(n)} f_{t} \,d\mu\nonumber\\[3pt] &\quad= \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]}\int_{Z_s} \operatorname{\mathrm{\mathbb{E}}} (f_0\mid\mathcal{Z}_s) \cdot S^{P_1(n)}\operatorname{\mathrm{\mathbb{E}}} (f_1\mid\mathcal{Z}_s) \cdots S^{P_{t}(n)} \operatorname{\mathrm{\mathbb{E}}} (f_t\mid\mathcal{Z}_s)\, d\lambda. \end{align} $$

Using the fact that $Z_s$ is an inverse limit of ergodic s-step nilsystems, we can approximate the average (13) arbitrarily well by projections onto ergodic nilsystems. Hence we are left with understanding averages of the form

(14) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in[N]} \int_{G/\Gamma} \tilde{f}_0(b\Gamma)\cdot \tilde{f}_1(a^{P_1(n)}b\Gamma) \cdots \tilde{f}_t(a^{P_t(n)} b\Gamma) \,d\nu(b\Gamma) \end{align} $$

where $\tilde {f}_i$ is the projection of $f_i$ onto an ergodic s-step nilsystem $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ for all $0\leqslant i\leqslant t$ . If T is totally ergodic, then so is the nilrotation $T_a$ .

2.3 Polynomial sequences

Let $G_{\bullet }$ be a filtration on G of degree s. A polynomial sequence $g:\mathbb {Z}\to G$ adapted to $G_{\bullet }$ is a sequence

(15) $$ \begin{align} g(n) = \prod_{i=0}^s g_i^{{{n}\choose{i}}} \end{align} $$

with the property that $g_i\in G_i$ for each i. Such sequences form a group denoted by ${{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ by [Reference Green and TaoGT12, Proposition 6.2]. One may ask why we define such a polynomial sequence as (15) rather than in the seemingly more natural form

(16) $$ \begin{align} g(n) = \prod_{i=0}^s g_i^{n^i}. \end{align} $$

The reason is that if g is written in the form (15), then we have the following nice statement.

Lemma 2.3. [Reference Candela and SisaskCS12, Lemma 2.8]

Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ . The sequence $g(n) = \prod _{i=0}^s g_i^{{n}\choose {i}}$ takes values in $H\leqslant G$ if and only if $g_0, \ldots , g_s\in H$ .

Proof. The converse direction is straightforward, and we prove the forward direction by induction on $0\leqslant k\leqslant s$ . For $k=0$ , we observe that $g_0 = g(0)\in H$ . Suppose that the statement holds for some $1\leq k <s$ , that is, $g_0, \ldots , g_k\in H$ . Then $g(k+1) =(\prod _{i=0}^k g_i^{{k+1}\choose {i}}) g_{k+1}$ . Since $g(k+1), g_0, \ldots , g_k$ are all in H, it follows that $g_{k+1}\in H$ .

Lemma 2.3 is not true if g is written in the form (16); for instance, $g(n) = {{n}\choose {2}} =\tfrac 12n^2 - \tfrac 12n$ takes values in $\mathbb {Z}$ even though $\tfrac 12, -\tfrac 12\notin \mathbb {Z}$ .

In a similar manner, we define for any $D\in \mathbb {N}_+$ the group ${{\mathrm{poly}}}(\mathbb {Z}^D, G_{\bullet })$ of D-parameter polynomial sequences $g:\mathbb {Z}^D\to G$ adapted to $G_{\bullet }$ , that is, sequences of the form

$$ \begin{align*} g(n_1, \ldots, n_D) = \prod_{i=0}^s\prod_{i_1+\cdots+i_D = i} {g_{i_1, \ldots, i_D}}^{{{n_1}\choose{i_1}}\cdots{{n_D}\choose{i_D}}} \end{align*} $$

for $g_{i_1, \ldots , i_D}\in G_{i_1+\cdots + i_D}$ .

2.4 Infinitary equidistribution theory on nilmanifolds

For the rest of §2 we assume that G is connected. For $D\in \mathbb {N}_+$ , a polynomial sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}^D,G_{\bullet })$ is equidistributed on $G/\Gamma $ if

$$ \begin{align*}\mathop{\mathbb{E}}\limits_{n\in[N]^D}F(g(n)\Gamma) \to \int_{G/\Gamma} F \,d\nu \end{align*} $$

for any continuous function $F:G/\Gamma \to \mathbb {C}$ . The following notion is useful when discussing equidistribution.

Definition 2.4. (Horizontal characters)

A horizontal character on G is a continuous group homomorphism $\eta : G\to \mathbb {R}$ for which $\eta (\Gamma )\leqslant \mathbb {Z}$ .

In particular, each horizontal character vanishes on $[G,G]$ .

Equidistribution on nilmanifolds was studied by Leibman, who provided a useful criterion for when a polynomial sequence is equidistributed on a nilmanifold. We only need the version of the statement in the case when G is connected, as we will be able to reduce to this case.

Theorem 2.5. (Leibman’s equidistribution theorem [Reference LeibmanLei05b])

Let $D\in \mathbb {N}_+$ and $g\in {{\mathrm{poly}}}(\mathbb {Z}^D, G_{\bullet })$ . The following are equivalent:

  1. (i) g is equidistributed in $G/\Gamma $ ;

  2. (ii) the projection of g onto $G/[G,G]$ is equidistributed in $G/[G,G]\Gamma $ ;

  3. (iii) if $\eta : G\to \mathbb {R}$ is a horizontal character for which $\eta \circ g$ is constant, then $\eta $ is trivial.

We shall also need a stronger notion of equidistribution, that of irrational sequences.

Definition 2.6. Suppose that $G_{\bullet }$ is a filtration on G and $i\in \mathbb {N}_+$ , and let

$$ \begin{align*} G_i^\nabla = \langle G_{i+1}, [G_j, G_{i-j}], 1\leqslant j < i \rangle. \end{align*} $$

An ith-level character is a continuous group homomorphism $\eta _i:G_i\to \mathbb {R}$ that vanishes on $G_i^\nabla $ and satisfies $\eta _i(\Gamma _i)\in \mathbb {Z}$ . An element $g_i$ of $G_i$ is irrational if $\eta _i(g_i)\notin \mathbb {Z}$ for any non-trivial ith-level character $\eta _i$ . A sequence $g(n) = \prod \nolimits _{i=0}^s g_i^{{n}\choose {i}}$ is irrational if $g_i$ is irrational for all $i\in \mathbb {N}_+$ .

All irrational sequences are equidistributed, but not vice versa. For instance, let $g(n) = a_1 n + \cdots + a_s n^s$ be a real-valued polynomial. It is a polynomial sequence in $\mathbb {R}$ adapted to the filtration $G_1 = \cdots = G_s = \mathbb {R}$ , $G_{s+1} = 0$ . Thus, g is irrational if and only if $a_s\notin \mathbb {Q}$ , and g is equidistributed if and only if there exists $1\leqslant i\leqslant s$ with $a_i\notin \mathbb {Q}$ . It is clear in this case that irrational implies equidistributed, but not vice versa.

We want to emphasize that whether a sequence is irrational or not depends on what filtration we are using, whereas the notion of equidistribution does not depend on the filtration.

3 Reducing to the case of connected groups

Equation (14) indicates that to understand Host–Kra complexity of a polynomial progression $\vec {P}$ , we have to understand the distribution of orbits

(17) $$ \begin{align} (b\Gamma,\; a^{P_1(n)}b\Gamma,\ldots,\; a^{P_t(n)}b\Gamma) \end{align} $$

inside a connected nilmanifold $G^{t+1}/\Gamma ^{t+1}$ . The point of this section is to show that we can replace linear orbits $(a^n b\Gamma )_{n\in \mathbb {N}}$ on $G/\Gamma $ by polynomial orbits $(g_b(n)\Gamma ^0)_{n\in \mathbb {N}}$ on $G^0/\Gamma ^0$ for some irrational polynomial sequence $g_b:\mathbb {Z}\to G^0$ with respect to a certain naturally defined filtration $G_{\bullet }^0$ on $G_0$ . This way, we want to reduce the question of finding the closure for (17) inside $(G/\Gamma )^{t+1}$ to finding the closure for

(18) $$ \begin{align} (g_b(m)\Gamma^0,\; g_b(m+ P_1(n))\Gamma^0,\ldots,\; g_b(m+P_t(n))\Gamma^0) \end{align} $$

inside $(G^0/\Gamma ^0)^{t+1}$ . The connectedness of $G^0$ allows us to use tools from §2.4.

Lemma 3.1. Let $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ be a totally ergodic nilsystem and $F:(G/\Gamma )^{t+1}\to \mathbb {R}$ be essentially bounded. Then

$$ \begin{align*} &\mathop{\mathbb{E}}\limits_{n\in[N]} \int_{G/\Gamma} F(b\Gamma, a^{P_1(n)}b\Gamma, \ldots, a^{P_t(n)}b\Gamma)\,d\nu(b\Gamma)\\[3pt] &\qquad=\mathop{\mathbb{E}}\limits_{m, n\in[N]} \int_{G/\Gamma} F(a^m b\Gamma, a^{m+P_1(n)}b\Gamma, \ldots, a^{m+P_t(n)}b\Gamma)\,d\nu(b\Gamma). \end{align*} $$

Proof. Since $T_a$ is measure-preserving, we have

$$ \begin{align*} &\int_{G/\Gamma} F(b\Gamma, a^{P_1(n)}b\Gamma, \ldots, a^{P_t(n)}b\Gamma)\,d\nu(b\Gamma)\\[3pt] &\quad= \int_{G/\Gamma} F(a^m b\Gamma, a^{m+P_1(n)}b\Gamma, \ldots, a^{m+P_t(n)}b\Gamma)\,d\nu(b\Gamma) \end{align*} $$

for any $m,n\in \mathbb {N}$ . Consequently,

$$ \begin{align*} &\int_{G/\Gamma} F(b\Gamma, a^{P_1(n)}b\Gamma, \ldots, a^{P_t(n)}b\Gamma)\,d\nu(b\Gamma)\\[3pt] &\quad=\mathop{\mathbb{E}}\limits_{m\in[N]} \int_{G/\Gamma} F(a^m b\Gamma, a^{m+P_1(n)}b\Gamma, \ldots, a^{m+P_t(n)}b\Gamma)\,d\nu(b\Gamma), \end{align*} $$

from which the lemma follows.

The main result of this section is the following proposition.

Proposition 3.2. Let $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ be a totally ergodic nilsystem and $b\in G^0$ . Suppose that $G_{\bullet }$ is the lower central series filtration on G and $G^0_{\bullet } = G_{\bullet }\cap G^0$ . Then there exists an irrational sequence $g_b\in {{\mathrm{poly}}}(\mathbb {Z}, G^0_{\bullet })$ such that $g_b(n)\Gamma = a^n b\Gamma $ .

We observe that with this filtration on $G^0$ , we have $G^0_k = G_k$ for $k\geqslant 2$ . This follows from the fact that the groups $G_k$ are connected for $k\geqslant 2$ [Reference Host and KraHK18, Lemma 5], and hence are contained in $G^0$ .

We lose no generality in assuming that $b\in G^0$ ; Proposition 2.2 and the connectedness of $G/\Gamma $ imply that for all $b\in G$ there exists $b'\in G^0$ such that $b\Gamma = b'\Gamma $ .

Proof. The connectedness of $G/\Gamma $ implies that $G=G^0\Gamma $ , and so there exist $\alpha \in G^0$ and $\gamma \in \Gamma $ such that $a = \alpha \gamma ^{-1}$ . Then

$$ \begin{align*} a^n b\Gamma = (\alpha\gamma^{-1})^n b\Gamma = (\alpha\gamma^{-1})^n b \gamma^{n}\Gamma. \end{align*} $$

It follows from normality of $G^0$ and the fact that $\alpha $ and b are elements of $G^0$ that the sequence $g_b(n) = (\alpha \gamma ^{-1})^n b \gamma ^{n}$ takes values in $G^0$ . Since the sequences $h_1(n) = a^n b$ and $h_2(n) = \gamma ^{n}$ are adapted to $G_{\bullet }$ , and the set ${{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is a group, we deduce that $g_b = h_1 h_2$ is adapted to $G^0_{\bullet } = G_{\bullet }\cap G^0$ .

We want a more precise description of $g_b$ , and for this we shall use some results from [Reference LeibmanLei09, §§ 11–13]. Let $g=g_b$ for the identity $b=1$ ; that is, $g(n) = (\alpha \gamma ^{-1})^n \gamma ^n$ . Leibman showed in [Reference LeibmanLei09, § 11.2] that

(19) $$ \begin{align} g(n) &= \prod_{1\leqslant k_1 \leqslant s} (A^{k_1-1}\alpha)^{q_{k_1}(n)} \prod_{1\leqslant k_2 < k_1 < s}[A^{k_1-1}\alpha, A^{k_2-1}\alpha]^{q_{k_1, k_2}(n)}\nonumber\\[6pt] &\quad\prod_{1\leqslant k_3 < k_2 < k_1 < s}[[A^{k_1-1}\alpha, A^{k_2-1}\alpha], A^{k_3-1}\alpha]^{q_{k_1, k_2, k_3}(n)} \ldots, \end{align} $$

where $Ax = [x,\gamma ]$ and $q_{k_1, \ldots , k_r}$ are integral polynomials with $\deg q_{k_1, \ldots , k_r} \leqslant k_1 + \cdots +~k_r$ . More explicitly, we have

(20) $$ \begin{align} g(n) &= \alpha^n (A\alpha)^{{n}\choose{2}} (A^2\alpha)^{{n}\choose{3}} \cdots [A\alpha, \alpha]^{{n}\choose{3}}[A^2\alpha,\alpha]^{{n}\choose{4}} \cdots\nonumber\\ &\quad[A^2\alpha,A\alpha]^{4{{n+1}\choose{5}}}[A^3\alpha,A\alpha]^{5{{n+1}\choose{6}}}\cdots. \end{align} $$

The coefficients of g can be analysed using a family of subgroups of $G^0$ introduced in [Reference LeibmanLei09, § 12]. For $k_1, \ldots , k_l\in \mathbb {N}_+$ , we let $G^0_{(k_1,\ldots ,k_l)}$ be the subgroup of $G^0$ generated by all l-fold commutators of elements of the form $A^{k_1-1}h_1$ , …, $A^{k_l-1}h_l$ for $h_1, \ldots , h_l\in G^0$ . (A one-fold commutator is any element $h\in G$ . For $l>1$ , an l-fold commutator is an element of the form $[h_i, h_j]$ , where $h_i$ is an i-fold commutator, $h_j$ is an j-fold commutator and $i+j = l$ .) We then define

$$ \begin{align*} G^0_{k,l} = \langle G^0_{(k_1, \ldots, k_i)}: i\geqslant l,\; k_1+\cdots+k_l\geqslant k\rangle \end{align*} $$

for integers $1\leqslant l\leqslant k$ and set $G^0_{k,l} = G^0_{l,l}$ whenever $l>k$ .

The following lemma lists some basic properties of the groups $G^0_{k,l}$ that we shall use.

Lemma 3.3. For any integers $1\leqslant l\leqslant k$ :

  1. (i) $G^0_{k,l}$ is normal in G;

  2. (ii) $[G^0_{k,l}, G^0_{i,j}]\leqslant G^0_{k+i, l+j}$ for any integers $1\leqslant i\leqslant j$ ;

  3. (iii) $A^j G^0_{k,l}\leqslant G^0_{k+j,l}$ for any $j\in \mathbb {N}$ ;

  4. (iv) $G^0_{k+1,l}$ and $G^0_{k,l+1}$ are subgroups of $G^{k,l}$ , and the quotient groups $G^0_{k,l}/G^0_{k+1,l}$ and $G^0_{k,l}/G^0_{k,l+1}$ are abelian;

  5. (v) for $k\geqslant 2$ , $G_k = G_k^0 = G^0_{k,1} = \langle A^{k-1}G^0, G^0_{k,2}\rangle = \langle A G^0_{k-1}, G^0_{k,2}\rangle $ ;

  6. (vi) $(G^0)^\nabla _k = \langle G^0_{k,2}, G^0_{k+1}\rangle $ .

Proof. Properties (i)–(iv) are proved in [Reference LeibmanLei09, Lemma 12.2]. For $k\geqslant 2$ , the statement $G_k = G^0_k$ in (v) is true by definition, and the statement $G_k = G^0_{k,1}$ is proved in [Reference LeibmanLei09, Lemma 12.3]. To finish the proof of (v), it remains to show that $G^0_{k,1} = \langle A^{k-1}G^0, G^0_{k,2}\rangle = \langle A G^0_{k-1}, G^0_{k,2}\rangle $ for $k\geqslant 2$ . For $k = 2$ , this is true by definition of $G^0_{k,1}$ and the fact that $G^0_{k,2}\geqslant G^0_{k,3}\geqslant \cdots \,$ , which follows from part (iv). We assume that the statement is true for some $k\geqslant 2$ . That $G_{k+1}^0$ contains $\langle A G^0_k, G^0_{k+1,2}\rangle $ follows from the fact that both $A G^0_k$ and $G^0_{k+1,2}$ are contained in the $(k+1)$ th element of the lower central series of G, which is precisely $G^0_{k+1}$ . For the other direction, we observe that

$$ \begin{align*} G^0_{k+1} &= [G_k, G] = [G^0_k, \langle G^0, \gamma \rangle] \leqslant \langle [G^0_k, G^0], [G^0_k,\gamma] \rangle \\[3pt] &\leqslant \langle [A^{k-1} G^0, G^0], [G^0_{k,2}, G^0], AG^0_k\rangle \leqslant \langle G^0_{k+1,2}, AG_k^0\rangle. \end{align*} $$

A similar argument shows that $G^0_{k+1} = \langle A^{k}G^0, G^0_{k+1,2}\rangle $ .

Before we prove property (vi), we recall that ${(G^0)^\nabla _k = \langle G_{k+1}, [G_j, G_{k-j}]: 1\leqslant j < k\rangle }$ . That (vi) holds for $k=1$ can be verified by inspection. For $k\geqslant 2$ , we observe that $[A^{j-1} G^0, A^{k-j-1}G^0]\leqslant [G_j^0, G_{k-j}^0]$ , and so

$$ \begin{align*} G^0_{k,2} \leqslant \langle [G^0_j, G^0_{k-j}]: 1\leqslant j < k\rangle; \end{align*} $$

when coupled with property (v), this implies that $(G^0)^\nabla _k \geqslant \langle G^0_{k,2}, G^0_{k+1}\rangle $ . For the converse, we have

$$ \begin{align*} [G_j^0, G^0_{k-j}] = [\langle A^{j-1}G^0, G^0_{j,2}\rangle, \langle A^{k-j-1}G^0, G^0_{k-j,2}\rangle] \leqslant \langle G^0_{k,2}, G^0_{k,3}, G^0_{k,4}\rangle \leqslant G^0_{k,2}, \end{align*} $$

for each $1\leqslant j < k$ , from which it follows that $(G^0)^\nabla _k \leqslant \langle G^0_{k,2}, G^0_{k+1}\rangle $ .

Letting $g(n) = \prod _{i=1}^s g_{i}^{{n}\choose {i}}$ , we observe from (19), (20) as well as parts (v) and (vi) of Lemma 3.3 that

(21) $$ \begin{align} g_i = A^{i-1}\alpha \operatorname{mod} \; (G^0)_{i}^\nabla. \end{align} $$

For an arbitrary $b\in G^0$ , we have $g_b(n) = a^n b \gamma ^n = b (\alpha _b \gamma ^{-1})^n\gamma ^n$ , where $\alpha _b = \alpha [\alpha ,b] Ab$ , as observed in [Reference LeibmanLei09, § 11.3]. Letting $g_b(n) = \prod _{i=0}^s g_{b,i}^{{n}\choose {i}}$ , it is therefore true that

(22) $$ \begin{align} g_{b,i} = A^{i-1}\alpha_b = A^{i-1}\alpha \operatorname{mod} \; (G^0)^\nabla_i \end{align} $$

for all $i\in \mathbb {N}_+$ .

For $i=1$ , we have $g_{b,1} = \alpha $ mod $G^0_2$ , and we claim that $g_{b,i}$ is irrational. The ergodicity of a implies that for almost every b, the sequence $n\mapsto a^n b$ is equidistributed in $G/\Gamma $ , and so the same is true for the sequence $g_b$ in $G^0/\Gamma ^0$ . Consequently, the projection $\pi (g_b): \mathbb {Z}\to G^0/(G^0_2\Gamma ^0)$ is equidistributed as well. Since $\pi (g_b(n)) = \pi (b) + \pi (\alpha ) n$ , it follows that $\pi (\alpha )$ is an irrational element of $G^0/G^0_2$ , and so $g_{b,1}$ is an irrational element of $G^0$ .

Before proving that $g_{b,i}$ are irrational for $i>1$ , we discuss some properties of the map $A: G\to G$ . From the definition of the filtration $G^0_{\bullet }$ we observe that $AG^0_i\leqslant G^0_{i+1}$ for all $i\geqslant 1$ (this is also a consequence of parts (iv) and (v) of Lemma 3.3). Therefore the map $A_i := A\mid _{G^0_i}$ takes values in $G^0_{i+1}$ , and moreover $A_i(\Gamma _i)\leqslant \Gamma _{i+1}$ . We also observe that the projection $\overline {A}_i: G^0_i\to G^0_{i+1}/(G^0)^\nabla _{i+1}$ is a (continuous) group homomorphism because

$$ \begin{align*} A(xy) = [xy, \gamma] = [x,\gamma][[x,\gamma],y][y,\gamma] = Ax [Ax,y] Ay = Ax Ay \operatorname{mod} G^0_{2i+1, 2} \end{align*} $$

for any $x,y\in G^0_i$ and $G^0_{2i+1,2}\leqslant G^0_{i+1,2}\leqslant (G^0)^\nabla _{i+1}$ by parts (iv) and (vi) of Lemma 3.3. From part (v) of Lemma 3.3 it follows that $\overline {A}_i$ is surjective. Finally, we note, using parts (iii) and (v) of Lemma 3.3, that $A_i((G^0)^\nabla _i)\leqslant (G^0)^\nabla _{i+1}$ .

Suppose that $g_{b,i}$ is irrational but $g_{b,i+1}$ is not for some $1\leqslant i < s$ . Then there exists a non-trivial $(i+1)$ th-level character $\eta _{i+1}:G^0_{i+1}\to \mathbb {R}$ such that $\eta _{i+1}(g_{b,i+1})\in \mathbb {Z}$ . From (22) and the fact that $\eta _{i+1}$ vanishes on $(G^0)^\nabla _{i+1}$ , we deduce that $\eta _{i+1}(g_{b,i+1})=\eta _{i+1}(A^i \alpha )$ . We also let $\overline {\eta }_{i+1}:G^0_{i+1}/(G^0)^\nabla _{i+1}\to \mathbb {R}$ be the induced map.

Let $\eta _i := \eta _{i+1}\circ A_i: G^0_i\to \mathbb {R}$ . It is an ith-level character as a consequence of four facts: the vanishing of $\eta _{i+1}$ on $(G^0)_{i+1}^\nabla $ , the inclusion $(G^0_{i+1,2})\leqslant (G^0)_{i+1}^\nabla $ (both of which imply that $\eta _i = \overline {\eta }_{i+1}\circ \overline {A}_i$ is a continuous group homomorphism), the inclusion $A_i((G^0)^\nabla _i)\leqslant (G^0)^\nabla _{i+1}$ , and the fact that $\eta _i(\Gamma _i)\leqslant \mathbb {Z}$ . Moreover, it satisfies

$$ \begin{align*}\eta_i(g_{b,i}) = \eta_i(A^{i-1}\alpha) = \eta_{i+1}(A^i \alpha) = \eta_{i+1}(g_{b,i+1}),\end{align*} $$

implying that $\eta _i(g_{b,i})\in \mathbb {Z}$ . The non-triviality of $\eta _{i+1}$ implies that $\overline {\eta }_{i+1}$ and $\overline {A}_i$ are surjective maps onto non-trivial groups; hence $\eta _i$ is non-trivial. This contradicts the irrationality of $g_{b,i}$ . By induction, $g_{b,1}$ , …, $g_{b,s}$ are all irrational, implying that $g_b$ is irrational.

Proposition 3.2 is vaguely reminiscent of [Reference Frantzikinakis and KraFK05, Proposition 3.1] in that we replace a linear sequence by a polynomial object on a simpler space. These two results are not equivalent, however, in that in Proposition 3.2 we end up with a polynomial sequence on a nilmanifold of a connected group, whereas in [Reference Frantzikinakis and KraFK05, Proposition 3.1] one obtains a unipotent affine transformation on a torus.

Lemma 3.4. Let $G_{\bullet }$ and $G^0_{\bullet }$ be as given in Proposition 3.2. Then $Z_i(G/\Gamma )=Z_i(G^0/\Gamma ^0)=G^0/(G^0_{i+1}\Gamma ^0)$ for each $i\in \mathbb {N}$ .

Proof. We take the cases $i=0$ and $i>0$ separately. For $i>0$ , we recall from (12) that $Z_i(G/\Gamma ) = G/G_{i+1}\Gamma $ . Since $G/\Gamma = G^0/\Gamma ^0$ by connectedness of $G/\Gamma $ , and $G_j = G^0_j$ for $j\geqslant 2$ , it follows that

$$ \begin{align*}Z_i(G^0/\Gamma^0) = Z_i(G/\Gamma) = G/G_{i+1}\Gamma = G^0/G^0_{i+1}\Gamma^0.\end{align*} $$

For $i=0$ , we have $Z_i(G/\Gamma ) = G/G^0\Gamma = 1 = G^0/G^0\Gamma ^0 = Z_i(G^0/\Gamma ^0)$ .

4 Homogeneous and inhomogeneous polynomial progressions

The central message of this paper is that homogeneous polynomial progressions satisfy certain linear algebraic properties that make them pliable for our analysis. In this section we explicitly describe these properties.

Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Let $V_k$ be the subspace of $\mathbb {R}[x,y]$ given by

$$ \begin{align*} V_k &= {{\mathrm{Span}}}_{\mathbb{R}}\{ (x+P_i(y))^j:\; 0\leqslant i\leqslant t,\; 1\leqslant j\leqslant k \}\\[6pt] &={{\mathrm{Span}}}_{\mathbb{R}}\bigg\{ {{x+P_i(y)}\choose{j}}:\; 0\leqslant i\leqslant t,\; 1\leqslant j\leqslant k \bigg\}, \end{align*} $$

and similarly let

$$ \begin{align*} W_k ={{\mathrm{Span}}}_{\mathbb{R}} \bigg\{ {{x+P_i(y)}\choose{k}}:\; 0\leqslant i\leqslant t \bigg\}. \end{align*} $$

Thus, the space $V_k$ consists of all the polynomials in $x,\; x+P_1(y),\ldots , x+P_t(y)$ of degree up to k while the space $W_k$ is the span of ‘Taylor monomials’ ${{x}\choose {k}}, {{x+P_1(y)}\choose {k}}, \ldots , {{x+P_t(y)}\choose {k}}$ of degree k. We also set

$$ \begin{align*} V^* &= {{\mathrm{Span}}}_{\mathbb{R}}\{(Q_0, \ldots, Q_{t})\in\mathbb{R}[u]^{t+1}: Q_0(x)\\[3pt] &\quad+ Q_1(x+P_1(y)) + \cdots + Q_{t}(x+P_{t}(y)) = 0 \} \end{align*} $$

to be the space of all algebraic relations satisfied by $\vec {P}$ . We recall that an algebraic relation $(Q_0, \ldots , Q_{t})$ is homogeneous if there exist $d\in \mathbb {N}$ and $a_0, \ldots , a_d\in \mathbb {R}$ not all zero such that $Q_i(u) = a_i u^d$ for each $0\leqslant i\leqslant t$ . We call $\vec {P}$ homogeneous if $V^*$ is spanned by homogeneous algebraic relations, and inhomogeneous otherwise.

The concepts of integral polynomial progression and homogeneity, as well as our results in this paper, could likely be extended to multiparameter polynomial progressions of the form

$$ \begin{align*} (x,\; x+P_1(y_1, \ldots, y_r),\ldots,\; x+P_t(y_1, \ldots, y_r)); \end{align*} $$

however, we do not pursue this generalization so as not to obfuscate the notation.

Some important examples of homogeneous progressions include:

  1. (i) linear progressions $(x,\; x+a_1 y,\ldots ,\; x+ a_t y)$ for distinct non-zero integers $a_1, \ldots , a_t$ , and more generally linear progressions of the form $(x,\; x + \psi _1(y_1, \ldots , y_r),\ldots ,\; x + \psi _t(y_1, \ldots , y_r)$ for some linear forms $\psi _1, \ldots , \psi _t:\mathbb {Z}^r\to \mathbb {Z}$ ;

  2. (ii) progressions of algebraic complexity 0, that is, progressions where the polynomials $P_1, \ldots , P_t$ are integral and linearly independent;

  3. (iii) progressions of algebraic complexity 1, such as $(x,\; x+y,\; x+y^2,\; x+y+y^2)$ , which satisfy no quadratic or higher-order algebraic relation.

Another, less obvious example of a homogeneous progression is $(x, \; x+y,\; x+2y, x+y^3)$ , already mentioned in the introduction, which only satisfies the homogeneous relation

(23) $$ \begin{align} x - 2(x+y) + (x+2y) = 0. \end{align} $$

This progression should be contrasted with $(x, \; x+y,\; x+2y,\; x+y^2)$ , which is inhomogeneous because it satisfies both (23) and the inhomogeneous relation

(24) $$ \begin{align} x^2 + 2x - 2(x+y)^2 + (x+2y)^2 - 2(x+y^2) = 0 \end{align} $$

which cannot be written down as a sum of homogeneous relations. More generally, progressions of the form

$$ \begin{align*} (x,\; x+y,\ldots,\; x+(t-1)y,\; x+ P_t(y)) \end{align*} $$

are all inhomogeneous whenever $1<\deg P_t < t$ because there exist polynomials $Q_0, \ldots , Q_{t-1}$ of degree $\deg P_t$ for which

$$ \begin{align*} Q_0(x) + Q_1(x+y) + \cdots + Q_{t-1}(x+(t-1)y) + (x+P_t(y)) = 0. \end{align*} $$

When discussing algebraic relations $(Q_0, \ldots , Q_t)$ , we want to move freely between expressing the polynomials $Q_i$ in terms of the standard basis $\{u^k: k\in \mathbb {N}\}$ on the one hand and the Taylor basis $\{{{u}\choose {k}}: k\in \mathbb {N}\}$ on the other hand. The next lemma allows us to make this transition for homogeneous polynomials.

Lemma 4.1. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Let $(Q_0, \ldots , Q_t)$ be an algebraic relation of degree d satisfied by $\vec {P}$ , and set $Q_i(u) = \sum _{k=0}^d b_{ik}{{u}\choose {k}}$ . Then the following conditions are equivalent.

  1. (i) The relation $(Q_0, \ldots , Q_t)$ is a sum of homogeneous algebraic relations.

  2. (ii) For every $0\leq k\leq d$ and $0\leq j\leq k$ , we have

    (25) $$ \begin{align} b_{0k} {{x}\choose{j}} + b_{1k} {{x+P_1(y)}\choose{j}} + \cdots + b_{tk} {{x+P_t(y)}\choose{j}} = 0. \end{align} $$
  3. (iii) For every $0\leq k\leq d$ and $0\leq j\leq k$ , we have

    (26) $$ \begin{align} b_{0k} x^j + b_{1k} (x+P_1(y))^j + \cdots + b_{tk} (x+P_t(y))^j = 0. \end{align} $$

In particular, condition (ii) implies that homogeneous relations can equivalently be defined as relations of the form $(Q_0, \ldots , Q_t) = (a_0 {{u}\choose {d}}, \ldots , a_t {{u}\choose {d}})$ .

Proof. We first show the equivalence of (ii) and (iii), followed by the equivalence of (i) and (iii).

The implication (iii) $\implies $ (ii) follows from the fact that the polynomial ${{u}\choose {j}}$ is a sum of the polynomials $1, u, \ldots , u^j$ . For the converse, we similarly note that $u^j$ is a sum of the polynomials $1, u, \ldots , {{u}\choose {j}}$ .

To prove the equivalence of (i) and (iii), we set ${{u}\choose {k}} = \sum _{j=0}^k c_{jk} u^j$ for each $k\in \mathbb {N}$ , so that $Q_i(u) = \sum _{k=0}^d b_{ik}\sum _{j=0}^k c_{jk} u^j$ . Importantly, $c_{jk}\neq 0$ for any $0\leq j\leq k$ . This allows us to rewrite

$$ \begin{align*} 0 &= Q_0(x) + Q_1(x+P_1(y)) + \cdots + Q_t(x+P_t(y))\\ &= \sum_{i=0}^t \sum_{k=0}^d b_{ik}\sum_{j=0}^k c_{jk} (x+P_i(y))^j\\ &= \sum_{j=0}^k \sum_{i=0}^t \sum_{k=j}^d b_{ik} c_{jk} (x+P_i(y))^j. \end{align*} $$

The relation $(Q_0, \ldots , Q_t)$ is a sum of homogeneous algebraic relations if and only if for every $0\leq j\leq d$ , we have

(27) $$ \begin{align} \sum_{k=j}^d c_{jk}\sum_{i=0}^t b_{ik} (x+P_i(y))^j = 0, \end{align} $$

and it is immediate from this expression that (iii) implies (i). To prove the implication (i) $\implies $ (iii), we assume that the relation $(Q_0, \ldots , Q_t)$ is indeed a sum of homogeneous algebraic relations, and so (27) holds for $0\leq j\leq d$ . Taking $j = d$ implies (26) for $j=d$ , $k=d$ , that is,

$$ \begin{align*} b_{0d} x^d + b_{1d} (x+P_1(y))^d + \cdots + b_{td} (x+P_t(y))^d = 0. \end{align*} $$

Taking partial derivatives with respect to x of the expression above $d-j$ times implies (26) for $k = d$ and $0\leq j \leq d$ .

Thus, (27) equals

(28) $$ \begin{align} \sum_{k=j}^{d-1} c_{jk}\sum_{i=0}^t b_{ik} (x+P_i(y))^j = 0. \end{align} $$

Running the same argument as above, we prove (26) for $k = d-1$ and $0\leq j\leq d-1$ . Downward induction on k proves (iii) for all required values of k and j.

We observe that the argument in Lemma 4.1 relied on the fact that the polynomial progression takes the form

(29) $$ \begin{align} (x,\; x+P_1(y),\ldots,\; x+P_t(y)) \end{align} $$

(taking $P_1, \ldots , P_t$ to be polynomials of several variables would also do) rather than the more general form

(30) $$ \begin{align} (P_1(x, y),\ldots,\; P_t(x, y)). \end{align} $$

This is because when the progression takes the form (29), we can use partial differentiation with respect to x to lower the degree of algebraic relations. Without this, Lemma 4.1 need not hold, and so we would not have the same correspondence between relations of the form $(Q_0, \ldots , Q_t) = (a_0 u^d, \ldots , a_t u^d)$ and $(b_0 {{u}\choose {d}}, \ldots , b_t {{u}\choose {d}})$ .

The special form (29) of our progression also ensures that we do not encounter issues similar to what has been discovered by Altman with regards to the original proof and statement of Theorem 1.13 from [Reference Green, Tao, Bárány, Solymosi and SágiGT10] (see [Reference TaoTao20] for the explanation of the problem). Similar issues would quite plausibly have appeared, however, if we had dealt with more general progressions like (30).

We define several more families of polynomial vector spaces. For $k\in \mathbb {N}_+$ , we let

$$ \begin{align*} W_k^c = W_k\cap\sum_{j\neq k}W_j \quad {{\mathrm{and}}}\quad W^c = \sum_k W_k^c, \end{align*} $$

as well as the family of quotient spaces

$$ \begin{align*} W^{\prime}_k = W_k/W_k^c = W_k/{\bigg(W_k\cap\sum_{j\neq k}W_j\bigg)}. \end{align*} $$

The space $W_k^c$ captures all the polynomials in $W_k$ that ‘participate’ in inhomogeneous algebraic relations, an intuition made more precise by the result below and the examples discussed below Proposition 4.3. The notation $W_k^c$ is supposed to signify the fact that $W_k^c$ is a complement of the subspace $W_k'$ inside $W_k$ .

Proposition 4.2. (Equivalent conditions for homogeneity)

Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. The following are equivalent:

  1. (i) $\vec {P}$ is homogeneous;

  2. (ii) $W_k^c$ is trivial for each $k\in \mathbb {N}_+$ ;

  3. (iii) $W^{\prime }_k = W_k$ for each $k\in \mathbb {N}_+$ .

Intuitively, Proposition 4.2 states that homogeneity is equivalent to the fact that for every $k\in \mathbb {N}_+$ , there are no polynomials in $W_k$ that could be used in constructing an inhomogeneous algebraic relation (condition (ii)). When proving Theorem 6.7, our key equidistribution result on nilmanifolds, we will use condition (iii) of Proposition 4.2.

Proof. The equivalence of (ii) and (iii) follows trivially from the definition of $W^{\prime }_k$ , and we focus on showing the equivalence of (i) and (ii) instead. The inhomogeneity of $\vec {P}$ implies the existence of a non-trivial algebraic relation

$$ \begin{align*} (Q_0(u), \ldots, Q_t(u)) = \bigg(\sum_k b_{0k} {{u}\choose{k}}, \ldots, \sum_k b_{tk} {{u}\choose{k}} \bigg) \end{align*} $$

that is not a sum of homogeneous algebraic relations. By Lemma 4.1, this means that there exist $k\in \mathbb {N}_+$ and $0\leq j \leq k$ for which

(31) $$ \begin{align} b_{0k} {{x}\choose{j}} + b_{1k} {{x+P_1(y)}\choose{j}} + \cdots + b_{tk} {{x+P_t(y)}\choose{j}} \neq 0. \end{align} $$

We claim that in fact we can take $j=k$ . We define the discrete derivative of $Q\in \mathbb {R}[u]$ to be $\partial Q(u) = Q(u+1) - Q(u)$ , and the partial discrete derivative of $R\in \mathbb {R}[x,y]$ with respect to x to be $\partial _x R(x,y) = R(x+1, y) - R(x,y)$ . Observing that $\partial {{u}\choose {k}} = {{u+1}\choose {k}}-{{u}\choose {k}} = {{u}\choose {k-1}}$ , we deduce that $\partial _x {{x+P_i(y)}\choose {k}} = {{x+P_i(y)}\choose {k-1}}$ . It follows that if

$$ \begin{align*} b_{0k} {{x}\choose{j}} + b_{1k} {{x+P_1(y)}\choose{j}} + \cdots + b_{tk} {{x+P_t(y)}\choose{j}} = 0, \end{align*} $$

then applying the partial discrete derivative with respect to x to the expression above $k-j$ times would imply

$$ \begin{align*} b_{0k} {{x}\choose{j}} + b_{1k} {{x+P_1(y)}\choose{j}} + \cdots + b_{tk} {{x+P_t(y)}\choose{j}} = 0, \end{align*} $$

contradicting (31). We can thus assume that $j=k$ in (31).

Since

$$ \begin{align*} Q_0(x) + Q_1(x+P_1(y)) + \cdots + Q_t(x+P_t(y)) = 0, \end{align*} $$

we have

$$ \begin{align*} R(x,y) = -\sum_{j\neq k} \sum_{i=0}^t b_{ij} {{x+P_i(y))}\choose{j}} \in \sum_{j\neq k} W_j, \end{align*} $$

and so $W^c_k = W_k\cap \sum _{j\neq k} W_j$ is non-trivial. Thus (ii) implies (i) by contrapositive. The argument can be reversed, and so (i) and (ii) are in fact equivalent.

For homogeneous progressions, it is quite straightforward to obtain an upper bound on algebraic complexity.

Proposition 4.3. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be a homogeneous polynomial progression. Then $\mathcal {A}_i(\vec {P})\leqslant t-1$ for each $0\leqslant i\leqslant t$ .

This bound is sharp, as evidenced by the example of arithmetic progressions.

Proof. Suppose that $\mathcal {A}_i(\vec {P}) \geq t$ , and let

$$ \begin{align*} Q_0(x) + Q_1(x+P_1(y)) + \cdots + Q_t(x+P_t(y)) = 0 \end{align*} $$

be an algebraic relation of degree $d\geq t$ . Taking the partial derivative with respect to x of the expression above $d-t$ times, we can assume that the relation has degree t. The homogeneity of $\vec {P}$ and Lemma 4.1 imply the existence of non-trivial algebraic relations of the form

(32) $$ \begin{align} a_0 {{x}\choose{t}} + a_1 {{x+P_i(y)}\choose{t}} + \cdots + a_t {{x+P_t(y)}\choose{t}} = 0. \end{align} $$

Relation (32) and the formula

$$ \begin{align*} {{x+P_i(y)}\choose{t}}={{x}\choose{t}}+{{x}\choose{t-1}}P_i(y) + {{x}\choose{t-2}}{{P_i(y)}\choose{2}}+ \cdots +{{P_i(y)}\choose{t}}, \end{align*} $$

imply

$$ \begin{align*} a_1 {{P_i(y)}\choose{k}} + \cdots + a_t {{P_t(y)}\choose{k}} = 0 \end{align*} $$

for $1\leq k\leq t$ . This gives us t equations

$$\begin{align*} a_1 P_1(y) + \cdots + &a_t P_t(y) = 0, \nonumber \\[3pt] a_1 P_1(y)^2 + \cdots + &a_t P_t(y)^2 = 0, \nonumber \\[3pt] \vdots \qquad\qquad\qquad\quad &\vdots \nonumber \\[3pt] a_1 P_1(y)^t + \cdots + &a_t P_t(y)^t = 0. \end{align*}$$

The invertibility of the Vandermonde matrix and the distinctness of the polynomials $P_1, \ldots , P_t$ imply that these t equations can only be satisfied if $a_1 = \cdots = a_t = 0$ , which also implies $a_0 = 0$ . This contradicts the non-triviality of (32).

Proposition 4.2 implies that homogeneous progressions satisfy

(33) $$ \begin{align} V_k = \bigoplus_{i=1}^k W_i = \bigoplus_{i=1}^k W^{\prime}_i. \end{align} $$

In the inhomogeneous case, we instead have

(34) $$ \begin{align} V_k = \sum_{i=1}^k W_i = \bigg(\!\bigoplus_{i=1}^k W^{\prime}_i\bigg)\oplus(W^c\cap V_k) \end{align} $$

for some non-trivial subspace $W^c\cap V_k$ . The non-triviality of this subspace is the main source of difficulty preventing us from generalizing Theorem 1.11 to inhomogeneous progressions.

Given the rather abstract nature of the spaces $W_k, W^{\prime }_k$ and $W^c_k$ , we illustrate their definitions with concrete examples. For the homogeneous progression $(x, x+y, x+2y, x + y^3)$ , we have

$$ \begin{align*} & W_1' = W_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{x, y, y^3\}\quad {{\mathrm{and}}} \\[6pt] & W^{\prime}_2 = W_2 = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{x}\choose{2}}, xy+ {{y}\choose{2}}, y^2, xy^3 + {{y^3}\choose{2}}\bigg\}, \end{align*} $$

while for the inhomogeneous progression $(x, \; x+y,\; x+2y, x+y^2)$ , we have

$$ \begin{align*} W_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{x, y, y^2\}\quad{{\mathrm{and}}}\quad W_2 = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{x}\choose{2}}, xy+ {{y}\choose{2}}, y^2, xy^2 + {{y^2}\choose{2}}\bigg\} \end{align*} $$

but

$$ \begin{align*} & W_1' = {{\mathrm{Span}}}_{\mathbb{R}}\{x, y\},\quad W_2' = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{x}\choose{2}}, xy+ {{y}\choose{2}}, xy^2 + {{y^2}\choose{2}}\bigg\}\quad {{\mathrm{and}}} \\ & W^c = {{\mathrm{Span}}}_{\mathbb{R}}\{y^2\}. \end{align*} $$

The non-triviality of $W^c$ for the latter progression is intrinsically related to the algebraic relation (24).

The spaces $V_k$ and $W_k$ are subspaces of $\mathbb {R}[x,y]$ . We also need an analogous family of subspaces of $\mathbb {R}^{t+1}$ . For a polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ , we let

$$ \begin{align*} \vec{P}^k(x,y) &= (x^k, (x+P_1(y))^k, \ldots, (x+P_{t}(y))^k) \quad {{\mathrm{and}}} \\[3pt] {{\vec{P}(x,y)}\choose{k}} &= \bigg({{x}\choose{k}}, {{x+P_1(y)}\choose{k}}, \ldots, {{x+P_{t}(y)}\choose{k}}\bigg). \end{align*} $$

We then define

$$ \begin{align*} \mathcal{P}_k &={{\mathrm{Span}}}_{\mathbb{R}}\{\vec{P}^k(x,y): x, y\in\mathbb{R}\}\\ &={{\mathrm{Span}}}_{\mathbb{R}}\{(x^k, (x+P_1(y))^k, \ldots, (x+P_{t}(y))^k): x,y\in\mathbb{R}\} \end{align*} $$

for each $k\in \mathbb {N}_+$ . The following lemma gives equivalent formulas for the spaces $\mathcal {P}_k$ .

Lemma 4.4. Let $t, k\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Then

$$ \begin{align*} \mathcal{P}_k &= {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{P}^k(x,y): x, y\in\mathbb{R}\} = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{P}^j(x,y): x, y\in\mathbb{R},\; 1\leq j\leq k\}\\[3pt] &= {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{\vec{P}(x,y)}\choose{k}}: x, y\in\mathbb{R}\bigg\} = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{\vec{P}(x,y)}\choose{j}}: x, y\in\mathbb{R},\; 1\leq j\leq k\bigg\}. \end{align*} $$

Proof. We fix $k\in \mathbb {N}_+$ and denote

$$ \begin{align*} A_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{P}^k(x,y): x, y\in\mathbb{R}\},\ \ &A_2 = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{P}^j(x,y): x, y\in\mathbb{R},\; 1\leq j\leq k\},\\[3pt] A_3 = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{\vec{P}(x,y)}\choose{k}}: x, y\in\mathbb{R}\bigg\},\ \ &A_4 = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{\vec{P}(x,y)}\choose{j}}: x, y\in\mathbb{R},\; 1\leq j\leq k\bigg\} \end{align*} $$

for the four spaces mentioned in the statement of the lemma. It is clear that $A_1\subseteq A_2$ and $A_3 \subseteq A_4$ . To prove the converse inclusions, we note that $({\partial }/{\partial _x})\vec {P}(x,y)^k = k \vec {P}(x,y)^{k-1}$ and $\partial _x{{\vec {P}(x,y)}\choose {k}} = {{\vec {P}(x,y)}\choose {k-1}}$ , where ${\partial }/{\partial _x}$ is the usual partial derivative with respect to x and $\partial _x$ is the partial discrete derivative with respect to x defined in the proof of Lemma 4.1. For every $R(x,y)\in A_1$ and $h\neq 0$ , the expression

$$ \begin{align*}\cfrac{R(x+h, y)-R(x, y)}{h}\end{align*} $$

is still in $A_1$ , and so $({\partial }/{\partial _x})R(x,y)\in A_1$ by the closeness of $A_1$ . Applying ${\partial }/{\partial _x}$ to $\vec {P}(x,y)^k$ exactly $k-j$ times with the observations above, we deduce that $\vec {P}(x,y)^j$ is still in $A_1$ . Hence $A_2\subseteq A_1$ . Analogously, applying $\partial _x$ exactly $k-j$ times to ${{\vec {P}(x,y)}\choose {k}}$ , we deduce that ${{\vec {P}(x,y)}\choose {j}}\in A_3$ , and so $A_4\subseteq A_3$ .

It remains to show that $A_2 = A_4$ . For this, we note that $\vec {P}(x,y)^k$ is a linear combination of ${{\vec {P}(x,y)}\choose {1}}, \ldots , {{\vec {P}(x,y)}\choose {k}}$ , and conversely ${{\vec {P}(x,y)}\choose {k}}$ is a linear combination of $\vec {P}(x,y), \ldots , \vec {P}(x,y)^k$ , from which the equality $A_2 = A_4$ follows.

Henceforth, we treat $\mathbb {R}^{t+1}$ as an $\mathbb {R}$ -algebra with coordinatewise multiplication $\vec {v}\cdot \vec {w}=(v(0) w(0), \ldots , v(t) w(t))$ for $\vec {v}=(v(0), \ldots , v(t))$ and $\vec {w}=(w(0), \ldots , w(t))$ . We similarly let $A\cdot B = \{\vec {a}\cdot \vec {b}: \vec {a}\in A, \vec {b}\in B\}$ be the product set of A and B for any ${A,B\subseteq \mathbb {R}^{t+1}}$ . With these definitions, we observe that $\mathcal {P}_{i+j}\leqslant \mathcal {P}_i\cdot \mathcal {P}_j$ , but the converse is in general not true. We also set $\vec {e}_i$ to be the coordinate vector with 1 in the ith place and 0 elsewhere.

We conclude this section by relating the spaces $W_k$ and $W^{\prime }_k$ to $\mathcal {P}_k$ . Let $t_k = \dim W_k$ and $t^{\prime }_k = \dim W^{\prime }_k$ for each $k\in \mathbb {N}$ . The spaces $W_k$ and $\mathcal {P}_k$ are connected as follows. Let $\{Q_{k,1}, \ldots , Q_{k,t_k}\}$ be a basis for $W_k$ . Then

$$ \begin{align*} \bigg({{x}\choose{k}}, {{x+P_1(y))}\choose{k}}, \ldots, {{x+P_t(y)}\choose{k}}\bigg) = \sum_{j=1}^{t_k} \vec{v}_{k,j} Q_{k,j}(x,y) \end{align*} $$

for some linearly independent vectors $\vec {v}_{k,1}, \ldots , \vec {v}_{k,t_k}\in \mathbb {R}^{t+1}$ . We let $\tau _k(Q_{k,j})=\vec {v}_{k,j}$ , and extend this map to all of $W_k$ by linearity. This map depends on the choice of the basis for $W_k$ . It is surjective by the definition of $\mathcal {P}_k$ and injective by the linear independence of $\vec {v}_{k,1}, \ldots , \vec {v}_{k,t_k}$ . Hence it is a vector space isomorphism. In particular, Proposition 4.2 implies that $W^{\prime }_k\cong \mathcal {P}_k$ whenever $\vec {P}$ is homogeneous, a fact that we shall use a lot in the proof of Theorem 6.7.

To illustrate the aforementioned correspondence between $W_k$ and $\mathcal {P}_k$ , consider the progression $(x,\; x+y,\; x+2y,\; x+y^3)$ . The isomorphisms $\tau _1$ and $\tau _2$ are given by

$$ \begin{align*} \tau_1(x) = (1,1,1,1),\quad \tau_1(y) = (0,1,2,0),\quad \tau_1(y^3) = (0,0,0,1) \end{align*} $$

and

$$ \begin{align*} \tau_2\bigg({{x}\choose{2}}\bigg) = (1,1,1,1),\quad &\tau_2\bigg(xy+ {{y}\choose{2}}\bigg) = (0,1,2,0),\\[3pt] \tau_2(y^2) = (0,0,1,0),\quad &\tau_2\bigg(xy^3 + {{y^3}\choose{2}}\bigg) = (0,0,0,1). \end{align*} $$

5 Relating Host–Kra complexity to algebraic complexity

Having introduced the notation for the spaces $\mathcal {P}_i$ , we are ready to show precisely how determining Host–Kra complexity for homogeneous progressions can be reduced to a certain equidistribution problem on nilmanifolds. We start by defining a group which contains the orbit (18). Groups of this form have previously been defined in [Reference Candela and SisaskCS12, Reference Green, Tao, Bárány, Solymosi and SágiGT10, Reference KucaKuc21b, Reference LeibmanLei09], among others.

Definition 5.1. (Leibman group)

Let $t\in \mathbb {N}_+$ and G be a connected group with a filtration $G_{\bullet }$ of degree s. For an integral polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ , we define the associated Leibman group to be

$$ \begin{align*} G^P = \langle g_i^{\vec{v}_i}: g_i\in G_i, \vec{v}_i\in\mathcal{P}_i, 1\leqslant i\leqslant s \rangle, \end{align*} $$

where $h^{\vec {v}}=(h^{v(0)}, \ldots ,h^{v(t)})$ for any $h\in G$ and $\vec {v} = (v(0), \ldots , v(t))\in \mathbb {R}^{t+1}$ . We also set $\Gamma ^P=G^P\cap G^{t+1}$ . If $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ , then we denote

$$ \begin{align*} g^P(x,y) = (g(x), g(x+P_1(y)), \ldots, g(x+P_t(y))) \end{align*} $$

and observe that $g^P$ takes values in $G^P$ .

Lemma 5.2. Let $t\in \mathbb {N}_+$ and G be a connected group with a filtration $G_{\bullet }$ of degree s. Suppose that $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is an integral polynomial progression with $\mathcal {A}_i(\vec {P}) = s'$ for some $s'\in \mathbb {N}$ and some $0\leq i\leq t$ . Then $G^P$ contains $1^{i}\times G_{s'+1} \times 1^{t-i}$ .

Proof. The assumption $\mathcal {A}_i(\vec {P}) = s'$ implies that $(x+P_i(y))^{s'+1}$ is linearly independent of $(x+P_k(y))^{s'+1}$ for $k\neq i$ , hence $\mathcal {P}_{s'+1}$ contains $\vec {e}_i$ . The lemma then follows by the definition of $G^P$ .

We are now ready to state an infinitary version of the main technical result in the paper. This result constitutes the first part of Theorem 1.17.

Theorem 5.3. Let $t\in \mathbb {N}_+$ and G be a connected group with filtration $G_{\bullet }$ . Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is irrational and that $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is a homogeneous polynomial progression. Then $g^P$ is equidistributed on the nilmanifold $G^P/\Gamma ^P$ .

Importantly, Theorem 5.3 fails for inhomogeneous progressions in that for each inhomogeneous progression $\vec {P}$ , we can find a nilmanifold $G/\Gamma $ , a filtration $G_{\bullet }$ , and an irrational sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ for which the orbit of $g^P$ is contained in a proper subnilmanifold of $G^P/\Gamma ^P$ . An example of this is given in §9.

We now have all the tools to prove Theorem 5.3. However, we will later need a finitary version of Theorem 5.3, and so instead of proving twice what is essentially the same result, we shall only give the finitary proof later on and deduce Theorem 5.3 from it. For now, however, we can show how the $\mathcal {HK}_i(\vec {P}) \leqslant \mathcal {A}_i(\vec {P})$ part of Theorem 1.11 follows from Theorem 5.3.

Corollary 5.4. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be a homogeneous polynomial progression. For any $0\leqslant i\leqslant t$ , we have

$$ \begin{align*} \mathcal{HK}_i(\vec{P}) \leqslant \mathcal{A}_i(\vec{P}). \end{align*} $$

The converse inequality will follow from showing that algebraic complexity equals Weyl complexity, and that Weyl complexity is less than or equal to Host–Kra complexity, both of which are done in §11.

Proof of Corollary 5.4 using Theorem 5.3

Let $\mathcal {A}_i(\vec {P}) = s$ . Let $(X,\mathcal {X},\mu ,T)$ be a totally ergodic system, $f_0, \ldots , f_t\in L^\infty (\mu )$ , and suppose that $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_s)=0$ . By Theorem 1.4, the expression

(35) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]}\int_X f_0 \cdot T^{P_1(n)} f_1 \cdots T^{P_{t}(n)} f_{t} \,d\mu \end{align} $$

remains unchanged if we project the functions $f_0, \ldots , f_t$ onto the factor $\mathcal {Z}_{s_0}$ for some $s_0\in \mathbb {N}$ . If $s_0<s$ , then $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_{s_0}) = 0$ and the limit (35) is 0, so we can assume that $s_0\geqslant s$ . Since the factor $\mathcal {Z}_{s_0}$ is an inverse limit of $s_0$ -step nilsystems, we can approximate X by totally ergodic nilsystems.

Let $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ be a totally ergodic nilsystem, and $G_{\bullet }$ be the lower central series filtration on G. Using (12), it suffices to show that if $f_0, \ldots , f_t\in L^\infty (\nu )$ and $f_i$ vanishes on each coset of $G_{s+1}\Gamma $ , then

$$ \begin{align*} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in[N]} \int_{G/\Gamma} f_0(b\Gamma)\cdot f_1(a^{P_1(n)}b\Gamma) \cdots f_t(a^{P_t(n)} b\Gamma) \,d\nu(b\Gamma) = 0. \end{align*} $$

Let $G^0_{\bullet }$ be the filtration on $G^0$ given by $G^0_{\bullet } = G_{\bullet }\cap G^0$ , and let $g_b\in {{\mathrm{poly}}}(\mathbb {Z}, G^0_{\bullet })$ be the irrational sequence defined in Proposition 3.2 for which $a^n b\Gamma = g_b(n)\Gamma $ . The irrationality of $g_b$ , Lemma 3.1 and Theorem 5.3 imply that

$$ \begin{align*} &\lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in[N]} \int_{G/\Gamma} f_0(b\Gamma)\cdot f_1(a^{P_1(n)}b\Gamma) \cdots f_t(a^{P_t(n)} b\Gamma) \,d\nu(b\Gamma)\\[6pt] &\quad= \int_{G^0/\Gamma^0} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{m,n\in[N]} f_0(g_b(m)\Gamma^0)\\[6pt] &\qquad\cdot f_1(g_b(m+P_1(n))\Gamma^0) \cdots f_t(g_b(m+P_t(n))\Gamma^0) \,d\nu(b\Gamma^0)\\[6pt] &\quad = \int_{(G^0)^P/(\Gamma^0)^P} f_0 \otimes \cdots \otimes f_t \,d\nu^P, \end{align*} $$

where $(G^0)^P$ is the Leibman group for $\vec {P}$ and $\nu ^P$ is the Haar measure on $(G^0)^P/(\Gamma ^0)^P$ .

The assumption that $f_i$ vanishes on each coset of $G_{s+1}\Gamma $ in $G/\Gamma $ and Lemma 3.4 together imply that $f_i$ vanishes on each coset of $G^0_{s+1}\Gamma ^0$ inside $G^0/\Gamma ^0$ . By Lemma 5.2, the group $(G^0)^P$ contains $H=1^{i}\times G^0_{s+1} \times 1^{t-i}$ ; therefore

$$ \begin{align*} &\bigg|\int_{(G^0)^P/(\Gamma^0)^P} f_0 \otimes \cdots \otimes f_t \bigg| \leqslant \int_{(G^0)^P/H(\Gamma^0)^P} \bigg|\int_{x H(\Gamma^0)^P} f_0 \otimes \cdots \otimes f_t \bigg|\\[6pt] &\quad\leqslant\bigg(\prod_{j\neq i}\|f_j\|_\infty\bigg)\int_{(G^0)^P/H(\Gamma^0)^P} \bigg|\int_{x_i G^0_{s+1}\Gamma^0} f_i\bigg| = 0, \end{align*} $$

implying that $\mathcal {Z}_s$ is characteristic for the weak convergence of $\vec {P}$ at i.

Corollary 5.4 implies that if a progression $\vec {P}$ satisfies $\mathcal {A}_i(\vec {P})=s$ , then $\mathcal {Z}_s$ is characteristic for the weak or $L^2$ convergence of $\vec {P}$ at i for any totally ergodic system. We now prove Corollary 1.13, which extends this result to ergodic systems for eligible progressions, with a slight modification in the $s=0$ case. The proof is almost identical to the proof of Proposition 4.1 in [Reference FrantzikinakisFra08].

Proof of Corollary 1.13

Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an eligible homogeneous progression with $\mathcal {A}_i(\vec {P}) = s$ and let $(X,\mathcal {X},\mu ,T)$ be ergodic. By Theorem 1.4, there exists a Host–Kra factor that is characteristic for the weak and $L^2$ convergence of $\vec {P}$ . Since each Host–Kra factor is an inverse limit of nilsequences, we can approximate X by an ergodic nilsystem $(G/\Gamma , {\mathcal {G}}/\Gamma , \nu , T_a)$ . The compactness of $G/\Gamma $ and the assumption that G is generated by the connected component $G^o$ and a imply that $a^r \in G^o$ for some $r\in \mathbb {N}_+$ , and hence

(36) $$ \begin{align} \mathop{\mathbb{E}}\limits_{n\in[rN]}\prod_{i=1}^t T_a^{P_i(n)}f_i &= \mathop{\mathbb{E}}\limits_{j\in[r]} \mathop{\mathbb{E}}\limits_{n\in[N]} \prod_{i=1}^t T_a^{P_i(r(n-1)+j)} f_i \nonumber\\[3pt] &= \mathop{\mathbb{E}}\limits_{j\in[r]} \mathop{\mathbb{E}}\limits_{n\in[N]} \prod_{i=1}^t (T_a^r)^{\tilde{P}_{i,j}(n)} (T_a^{P_i(j)} f_i), \end{align} $$

where $\tilde {P}_{i,j}(n) = ({P_i(r(n-1)+j) - P_i(j)})/{r}$ . This is where we use the fact that $\vec {P}$ is eligible. The definition of eligibility implies that for any $0\leqslant j < r$ , the progression

$$ \begin{align*} \vec{\tilde{P}}_j(x,y) = (x,\; x+\tilde{P}_{1,j}(y),\ldots,\; x+\tilde{P}_{t,j}(y)) \end{align*} $$

is homogeneous and that $\mathcal {A}_i(\vec {\tilde {P}}_j) = \mathcal {A}_i(\vec {P})$ for every $0\leq i < r$ .

If $s>0$ , suppose that $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_s(T_a)) = 0$ . Then the equality $\mathcal {Z}_s(T_a) = \mathcal {Z}_s(T_a^r)$ and the $T_a$ -invariance of $\mathcal {Z}_s$ imply that $\operatorname {\mathrm {\mathbb {E}}}(T_a^{P_i(j)} f_i\mid \mathcal {Z}_s(T_a^r)) = 0$ . We deduce from Corollary 5.4 and the total ergodicity of $T_a^r$ on each connected component of $G/\Gamma $ that the expression in (36) converges to 0 as $N\to \infty $ .

If $s=0$ , suppose that $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {K}_{\mathrm {rat}}(T_a)) = 0$ . The total ergodicity of $T_a^r$ implies that $\mathcal {K}_{\mathrm {rat}}(T_a) = \mathcal {Z}_0(T_a^r)$ , and so $\operatorname {\mathrm {\mathbb {E}}}(T_a^{P_i(j)} f_i\mid \mathcal {Z}_0(T_a^r)) = 0$ . Again, it follows from Corollary 5.4 and the total ergodicity of $T_a^r$ on each connected component of $G/\Gamma $ that the expression in (36) converges to 0 as $N\to \infty $ .

We now show that progressions of algebraic complexity at most $1$ are eligible, which together with Corollary 1.13 immediately implies Corollary 1.14.

Lemma 5.5. Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an algebraic progression with $\max _i \mathcal {A}_i(\vec {P})\leq 1$ . Then $\vec {P}$ is homogeneous and eligible.

Proof. From the definition of inhomogeneous relations it follows that each inhomogeneous relation must have degree at least $2$ . Thus, the fact that $\vec {P}$ has algebraic complexity at most $1$ immediately implies that it is homogeneous.

To prove that $\vec {P}$ is eligible, we fix $r\in \mathbb {N}_+$ and $0\leq j < r$ . We show that the progression

$$ \begin{align*} \vec{\tilde{P}}(x,y) = (x,\; x+\tilde{P}_{1,j}(y),\ldots,\; x+\tilde{P}_{t,j}(y)), \end{align*} $$

where $\tilde {P}_{i,j}(y) = ({P_i(r(y-1)-j) - P_i(y)})/{r}$ , also has algebraic complexity at most $1$ , from which the eligibility of $\vec {P}$ will follow easily. Indeed, suppose first that $\vec {\tilde {P}}$ satisfies an algebraic relation of degree $2$ :

$$ \begin{align*} \sum_{i=0}^t a_{i2}\bigg(x+\frac{P_i(r(y-1) - j)-P_i(j)}{r}\bigg)^2 + a_{i1}\bigg(x+\frac{P_i(r(y-1) - j)-P_i(j)}{r}\bigg) = 0. \end{align*} $$

Setting $y' = r(y-1) - j,\; x' = x r,\; a_{i2}' = a_{i2}/r^2,\; a_{i1}' = a_{i1}/r$ for brevity and rearranging, we deduce that

$$ \begin{align*} \sum_{i=0}^t(a_{i2}'(x'+P_i(y'))^2 + (a^{\prime}_{i1} - 2a^{\prime}_{i2}P_i(j)) (x'+P_i(y')) + a^{\prime}_{i2}P_i(j)^2 - a^{\prime}_{i1} P_i(j)) = 0. \end{align*} $$

The homogeneity of $\vec {P}$ implies that

$$ \begin{align*} \sum_{i=0}^t a^{\prime}_{i2}(x'+P_i(y'))^2 = 0, \end{align*} $$

and the fact that $\vec {P}$ has algebraic complexity at most $1$ further implies that $a^{\prime }_{02} = \cdots = a_{t2}' = 0$ . The claim $a_{02} = \cdots = a_{t2} = 0$ follows by rescaling. Thus, $\vec {\tilde {P}}$ satisfies no algebraic relation of degree 2. It follows by induction that $\vec {\tilde {P}}$ satisfies no algebraic relation of degree $d> 2$ since each such relation

(37) $$ \begin{align} Q_0(x) + Q_1(x+\tilde{P}_{1,j}(y)) + \cdots + Q_t(x+\tilde{P}_{t,j}(y)) = 0 \end{align} $$

would induce an algebraic relation of degree $d-1$ by partially differentiating (37) with respect to x. This establishes the claim that $\vec {\tilde {P}}$ has algebraic complexity at most $1$ . Thus, every algebraic relation satisfied by $\vec {\tilde {P}}$ is of the form

$$ \begin{align*} a_0 x &+ a_1 \bigg(x+\frac{P_1(r(y-1)+j) - P_1(j)}{r}\bigg) \\[3pt] & + \cdots + a_t \bigg(x+\frac{P_t(r(y-1)+j) - P_t(j)}{r}\bigg) = 0 \end{align*} $$

and corresponds to an algebraic relation

$$ \begin{align*} a_0 x + a_1 (x+P_1(y)) + \cdots + a_t (x+P_t(y)) = 0 \end{align*} $$

satisfied by $\vec {P}$ . This one-to-one correspondence between the algebraic relations satisfied by $\vec {\tilde {P}}$ and $\vec {P}$ implies the eligibility of $\vec {P}$ .

Theorem 5.3 also allows us to prove the second part of Corollary 1.15.

Proof of Corollary 1.15(ii)

Let $(X, \mathcal {X}, \mu , T)$ be a totally ergodic system, and suppose that $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is an integral progression with algebraic complexity at most $1$ . This implies that $\vec {P}$ is homogeneous since each inhomogeneous algebraic relation must have degree at least 2. For each $0\leqslant i\leqslant t$ , let $P_i(y) = \sum _{j=1}^d a_{i,j} Q_j(y)$ and $L_i(y_1, \ldots y_d) = \sum _{j=1}^d a_{i,j} y_j$ for some $a_{i,j}\in \mathbb {Z}$ and integral polynomials $Q_1, \ldots , Q_d$ . Letting

$$ \begin{align*} \vec{L}(x,y_1, \ldots, y_d) = (x,\; x+L_1(y_1, \ldots, y_d),\ldots, x+L_t(y_1, \ldots, y_d)), \end{align*} $$

we observe that $\vec {P}(x,y) = \vec {L}(x, Q_1(y), \ldots , Q_d(y))$ . It follows that $\vec {L}$ also has algebraic complexity at most $1$ , since each algebraic relation of degree $(j_0, \ldots , j_t)$ between terms of $\vec {L}$ would immediately imply an algebraic relation of the same degree between terms of $\vec {P}$ after substituting $y_i = Q_i(y)$ .

Using the same argument as in the proof of Corollary 5.4, we reduce the question of understanding

(38) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{n\in [N]} \int_X \prod_{i=0}^t T^{P_i(n)} f_i \,d\mu \end{align} $$

to understanding

(39) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{x,y\in[N]} F(g^P(x,y)) \end{align} $$

for each essentially bounded function $F: (G/\Gamma )^{t+1}\to \mathbb {C}$ and an irrational sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ for some filtration $G_{\bullet }$ on G. Following the same method to analyse

(40) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{y_1, \ldots, y_d\in [N]} \int_X \prod_{i=0}^t T^{L_i(y_1, \ldots, y_d)} f_i \,d\mu, \end{align} $$

we deduce that understanding (40) comes down to estimating

(41) $$ \begin{align} \lim_{N\to\infty}\mathop{\mathbb{E}}\limits_{x,y_1, \ldots, y_d\in[N]} F(g^L(x,y_1, \ldots, y_d)), \end{align} $$

where

$$ \begin{align*}g^L(x,y_1, \ldots, y_d) = (g(x), g(x+L_1(y_1, \ldots, y_d)), \ldots, g(x+L_t(y_1, \ldots, y_d))).\end{align*} $$

By Theorem 5.3, the limit in (39) equals $\int _{G^P/\Gamma ^P} F$ ; by [Reference Green, Tao, Bárány, Solymosi and SágiGT10, Theorem 11] (while Theorem 11 of [Reference Green, Tao, Bárány, Solymosi and SágiGT10] has been shown to fail in general, its corrected version, to be found in the arXiv version of the paper at https://arxiv.org/abs/1002.2028, still holds in this case, as the system under consideration is translation-invariant), the limit in (41) is $\int _{G^L/\Gamma ^L} F$ for some subgroup $G^L\leqslant G^{t+1}$ . From the fact that $\max _i \mathcal {A}_i(\vec {P})\leqslant ~1$ we deduce that $G^P = \langle h_1^{\vec {v}_1}, G_2^{t+1}: h_1\in G_1, \vec {v}_1\in \mathcal {P}_1 \rangle $ ; similarly, the construction of the group $G^L$ in [Reference Green, Tao, Bárány, Solymosi and SágiGT10] and the fact that $\vec {L}$ has algebraic complexity at most $1$ reveal that $G^L = \langle h_1^{\vec {v}_1}, G_2^{t+1}: h_1\in G_1, \vec {v}_1\in \mathcal {L}_1 \rangle $ , where

$$ \begin{align*} \mathcal{L}_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{(x, x+L_1(y_1, \ldots, y_d), \ldots, x+L_t(y_1, \ldots, y_d)): x,y_1, \ldots, y_d\in\mathbb{R}\}. \end{align*} $$

We observe that $\mathcal {P}_1 = \mathcal {L}_1$ ; from this it follows that $G^P = G^L$ , and so the limits in (39) and (41) are equal. This implies that (38) and (40) are equal as well.

6 Finitary nilmanifold theory

Before we can prove a finitary version of Theorem 5.3, we need to introduce the necessary finitary concepts required for this task. Most concepts and definitions in this and the next section are taken from [Reference Candela and SisaskCS12, Reference Green, Tao, Bárány, Solymosi and SágiGT10, Reference Green and TaoGT12]. Throughout this section we assume that G is connected, and that each nilmanifold $G/\Gamma $ comes with a filtration $G_{\bullet }$ and a Mal’cev basis $\chi $ adapted to $G_{\bullet }$ . We call a nilmanifold endowed with filtration and a Mal’cev basis filtered. A Mal’cev basis is a basis for the Lie algebra of G with some special properties; since we do not explicitly work with the notion of Mal’cev basis or its rationality in this paper, we refer the reader to [Reference Green and TaoGT12] for definitions of these concepts. What matters for us is that each Mal’cev basis induces a diffeomorphism $\psi : G\to \mathbb {R}^m$ , called Mal’cev coordinate map, which satisfies the following properties:

  1. (i) $\psi (\Gamma ) = \mathbb {Z}^m$ ;

  2. (ii) $\psi (G_i) = \{0\}^{m-m_i}\times \mathbb {R}^{m_i}$ , where $m_i = \dim G_i$ .

Thus, $\psi $ provides a natural coordinate system on G that respects the filtration $G_{\bullet }$ and the lattice $\Gamma $ . Similarly to $\psi $ , we define maps $\psi _i:G_i\to \mathbb {R}^{m_i-m_{i+1}}$ by assigning to each element of $G_i$ its Mal’cev coordinates indexed by $m-m_i + 1$ , …, $m-m_{i+1}$ . With this definition, we have $\psi _i(x)=0$ if and only if $x\in G_{i+1}$ , and $\psi _i(x)\in \mathbb {Z}^{m_i-m_{i+1}}$ if and only if $x\in \Gamma _i$ .

Definition 6.1. (Complexity of nilmanifolds)

A filtered nilmanifold $G/\Gamma $ has complexity M if the degree s of the filtration $G_{\bullet }$ , the dimension m of the group G and the rationality of the Mal’cev basis $\chi $ are all bounded by M.

We remark that complexity of nilmanifolds has nothing to do with the four notions of complexity of polynomial progressions that we examine. Neither does complexity of nilsequences defined below.

Definition 6.2. (Nilsequences)

A function $f:\mathbb {Z}\to \mathbb {C}$ is a nilsequence of degree s and complexity M if $f(n)=F(g(n)\Gamma )$ , where $F:G/\Gamma \to \mathbb {R}$ is an M-Lipschitz function on a filtered nilmanifold $G/\Gamma $ of degree s and complexity M, and $g\in {{\mathrm{poly}}}(\mathbb {Z},G_{\bullet })$ .

Definition 6.3. (Quantitative equidistribution)

Let $D\in \mathbb {N}_+$ and $\delta>0$ . A sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}^D,G)$ is $(\delta ,N)$ -equidistributed on $G/\Gamma $ if

$$ \begin{align*} \bigg|\mathop{\mathbb{E}}\limits_{n\in[N]^D}F(g(n)\Gamma)-\int_{G/\Gamma}F \bigg|\leqslant\delta\Vert f\Vert_{{\mathrm{Lip}}} \end{align*} $$

for all Lipschitz functions $F:G/\Gamma \to \mathbb {C}$ , where $\Vert f\Vert _{{\mathrm{Lip}}}$ is the Lipschitz norm on F with respect to a metric defined in [Reference Green and TaoGT12].

It has been shown in Theorem 2.5 that equidistribution is related to horizontal characters. Given the Mal’cev coordinate map $\psi : G\to \mathbb {R}^m$ , each horizontal character can be written in the form $\eta (x) = k\cdot \psi (x)$ for some $k\in \mathbb {Z}^m$ . We call $|\eta |=|k|=|k_1|+\cdots +|k_m|$ the modulus of $\eta $ . Similarly, each ith-level character $\eta _i:G_i\to \mathbb {R}$ is of the form $\eta _i(x) = k\cdot \psi _i(x)$ for some $k\in \mathbb {Z}^{m_i-m_{i+1}}$ , and we define its modulus to be $|\eta _i|=|k| = |k_1|+\cdots +|k_{m_i-m_{i+1}}|$ .

We shall also need to quantify the notion of polynomials that are ‘almost constant’ mod $\mathbb {Z}$ , using a definition from [Reference Green and TaoGT12]. In what follows, $\|x\|_{\mathbb {R}/\mathbb {Z}} = \min \{|x-n|: n\in \mathbb {Z}\}$ is the circle norm of $x\in \mathbb {R}$ .

Definition 6.4. (Smoothness norm)

Let

$$ \begin{align*}Q(n_1, \ldots, n_D) = \sum_{i=0}^d \sum_{i_1+\cdots+i_D = i} a_{i_1, \ldots, i_D} {{n}\choose{i_1}}\cdots{{n}\choose{i_d}}\end{align*} $$

be a polynomial in $\mathbb {R}[n_1, \ldots , n_D]$ . For $N\in \mathbb {N}_+$ , we define the smoothness norm of Q to be

$$ \begin{align*} \|Q\|_{C^\infty[N]} = \max\{N^{i_1+\cdots+i_D} \|a_{i_1, \ldots, i_D}\|_{\mathbb{R}/\mathbb{Z}}: i_1, \ldots, i_D\in\mathbb{N},\; 1\leq i_1+\cdots + i_D \leq d\}. \end{align*} $$

In particular, $\|Q\|_{C^\infty [N]}$ is bounded from above as $N\to \infty $ if and only if Q is constant mod $\mathbb {Z}$ .

With these definitions, we are ready to state a quantitative version of Theorem 2.5.

Theorem 6.5. (Quantitative Leibman equidistribution theorem [Reference Green and TaoGT12, Theorem 2.9])

Let $\delta>0$ , $M\geqslant 2$ and $D, N\in \mathbb {N}_+$ with $D\leqslant M$ . Let $G/\Gamma $ be a filtered nilmanifold of complexity M and $g\in {{\mathrm{poly}}}(\mathbb {Z}^D, G_{\bullet })$ . Then there exists $C_M>0$ such that at least one of the following is true:

  1. (i) g is $(\delta ,N)$ -equidistributed in $G/\Gamma $ ;

  2. (ii) there exists a non-trivial horizontal character $\eta $ of modulus $|\eta |\ll \delta ^{-C_{M}}$ for which $\|\eta \circ g\|_{C^\infty [N]}\ll \delta ^{-C_{M}}$ .

We now need to quantify the notion of irrationality.

Definition 6.6. (Quantitative irrationality)

Let $G/\Gamma $ be a filtered nilmanifold of degree s, and suppose $A,N>0$ . An element $g_i\in G_i$ is $(A,N)$ -irrational if for every non-trivial ith-level character $\eta :G_i\to \mathbb {R}$ of modulus $|\eta |\leqslant ~A$ , we have $\|\eta (g_i)\|_{\mathbb {R}/\mathbb {Z}}\geqslant A/N^i$ . It is A-irrational if for every non-trivial ith-level character $\eta :G_i\to \mathbb {R}$ of modulus $|\eta |\leqslant ~A$ , we have $\eta \circ g_i\notin \mathbb {Z}$ . We say that a sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is $(A,N)$ -irrational (respectively, A-irrational) if $g_i$ is $(A,N)$ -irrational (respectively, A-irrational) for each $1\leqslant i\leqslant s$ . Similarly, we say that the nilsequence $n\mapsto F(g(n)\Gamma )$ is $(A,N)$ - or A-irrational if the polynomial sequence g is.

Clearly, $(A,N)$ -irrationality is stronger than A-rationality, but for some of our applications the latter notion will be sufficient.

We are now ready to state the finitary version of Theorem 5.3, which is the main technical result of this paper, and derive Theorem 5.3 from it.

Theorem 6.7. Let $t\in \mathbb {N}_+$ and $A,M,N\geqslant 2$ . Let $G/\Gamma $ be a filtered nilmanifold of complexity M. Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is $(A,N)$ -irrational, $F:(G/\Gamma )^{t+1}\to \mathbb {C}$ is M-Lipschitz, and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is a homogeneous polynomial progression. Then

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in[N]} F(g^P(x,y)\Gamma^{t+1}) = \int_{G^P/\Gamma^P} F + O_M(A^{-c_M}) \end{align*} $$

for some $c_M>0$ .

Proof of Theorem 5.3 using Theorem 6.7

Let $F:(G/\Gamma )^{t+1}\to \mathbb {R}$ be a continuous function. By the Stone–Weierstrass theorem, Lipschitz functions on a compact set form a dense subset of the algebra of continuous functions. Approximating F by a sequence of Lipschitz functions if necessary, we can assume without loss of generality that F is Lipschitz. We let M be the maximum of the complexity of $G/\Gamma $ and the Lipschitz norm of F.

Let $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ be an irrational sequence. For each $N\in \mathbb {N}_+$ , we let $A_N$ be the maximal real number A for which g is $(A_N, N)$ -irrational. We claim that $A_N\to \infty $ as $N\to \infty $ . If not, then there exist some number $A>0$ and an index $i\in \mathbb {N}_+$ with the property that $g_i$ is not $(A,N)$ -irrational for all $N\in \mathbb {N}_+$ . We fix this i. It follows that there exists a sequence of non-trivial ith-level characters $\eta _{N}: G_i\to \mathbb {R}$ of modulus at most A such that $\|\eta _N(g_i)\|_{\mathbb {R}/\mathbb {Z}}<A/N^i$ . Since there are only finitely many ith-level characters of modulus bounded by A, we conclude that there exists a non-trivial ith-level character $\eta $ of modulus at most A such that $\|\eta (g_i)\|_{\mathbb {R}/\mathbb {Z}} < A/N^i$ for all $N\in \mathbb {N}_+$ . Taking $N\to \infty $ , we see that $\eta (g_i)~\in ~\mathbb {Z}$ , contradicting the irrationality of $g_i$ .

It therefore follows from Theorem 6.7 that

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in[N]} F(g^P(x,y)\Gamma^{t+1}) = \int_{G^P/\Gamma^P} F + O_M(A_N^{-c_M}). \end{align*} $$

Since M is constant, letting $N\to \infty $ sends the error term to 0, implying that $g^P$ is equidistributed on $G^P/\Gamma ^P$ as claimed.

7 Reducing true complexity to an equidistribution question

In §§36 we have shown how the question of determining Host–Kra complexity for homogeneous progressions can be reduced to showing that $g^P$ is equidistributed on $G^P/\Gamma ^P$ . Determining true complexity for homogeneous progression comes down to exactly the same equidistribution question. All the arguments in this section can be viewed as finitary analogues of arguments in previous sections.

Since we are now primarily concerned with functions from $\mathbb {Z}/N\mathbb {Z}$ to $\mathbb {C}$ , we shall need an N-periodic version of certain previously defined concepts. In this section N is always a prime, and the group G is connected. A function $f:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ is called $1$ -bounded whenever $\Vert f\Vert _\infty \leqslant 1$ .

Definition 7.1. (Periodic sequences)

Let $G_{\bullet }$ be a filtration on G. A sequence $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is N-periodic if $g(n+N)g(n)^{-1}\in \Gamma $ for each $n\in \mathbb {Z}$ , and it is periodic if it is N-periodic for some $N>0$ . A nilsequence $n\mapsto F(g(n)\Gamma )$ is N-periodic (respectively, periodic) if g is.

Given a homogeneous polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ , we want to show that $\mathcal {A}_i(\vec {P}) = \mathcal {T}_i(\vec {P})$ for each $0\leqslant i\leqslant t$ . The forward inequality $\mathcal {A}_i(\vec {P}) \leq \mathcal {T}_i(\vec {P})$ is straightforward to derive (see Theorem 1.13 in [Reference KucaKuc21b]); it is the reverse inequality that poses a challenge. We thus want to prove the following theorem.

Theorem 7.2. Let $t\in \mathbb {N}_+$ , $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be a homogeneous polynomial progression, $0\leqslant i\leqslant t$ , and suppose that $\mathcal {A}_i(\vec {P}) = s$ . For every $\epsilon>0$ , there exist $\delta>0$ and $N_0\in \mathbb {N}$ such that for all primes $N>N_0$ and all $1$ -bounded functions $f_0, \ldots , f_t:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ , we have

$$ \begin{align*} \bigg|\mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}}f_0(x) f_1(x+P_1(y))\cdots f_t(x+P_t(y))\bigg| < \epsilon \end{align*} $$

whenever $\Vert f_i\Vert _{U^{s+1}}<\delta $ .

We know that each progression is controlled by some Gowers norm. The result below plays the same role in deriving Theorem 7.2 as Theorem 1.4 plays in the proof of Corollary 5.4.

Proposition 7.3. [Reference PelusePel19, Proposition 2.2]

Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. There exists $s\in \mathbb {N}_+$ with the following property: for every $\epsilon>0$ , there exist $\delta>0$ and $N_0\in \mathbb {N}$ such that for all primes $N>N_0$ and all $1$ -bounded functions $f_0, \ldots , f_t:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ , we have

$$ \begin{align*} \bigg|\mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}}f_0(x) f_1(x+P_1(y))\cdots f_t(x+P_t(y))\bigg| < \epsilon \end{align*} $$

whenever $\Vert f_i\Vert _{U^{s+1}}<\delta $ for some $0\leqslant i\leqslant t$ .

Next, we want to perform a finitary analogue of the approximation-by-nilsystems argument. This can be achieved with the help of a periodic version of a celebrated arithmetic regularity lemma from [Reference Green, Tao, Bárány, Solymosi and SágiGT10] in which the same polynomial sequence g is used in the decomposition of several functions.

Lemma 7.4. [Reference KucaKuc21b, Lemma 2.13]

Let $s, t\in \mathbb {N}_+ $ , $\epsilon>0$ , and $\mathcal {F}:\mathbb {R}_+\to \mathbb {R}_+$ be a growth function. There exist $M=O_{\epsilon ,\mathcal {F}}(1)$ , a filtered nilmanifold $G/\Gamma $ of degree s and complexity at most M, and an N-periodic, $\mathcal {F}(M)$ -irrational sequence $g\in {{\mathrm{poly}}}(\mathbb {Z},G_{\bullet })$ satisfying $g(0)=1$ such that for all $1$ -bounded functions $f_0, \ldots , f_t:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ , there exist decompositions

$$ \begin{align*} f_i = f_{i, \mathrm{nil}} + f_{i,\mathrm{sml}} + f_{i,\mathrm{unf}} \end{align*} $$

where

  1. (i) $f_{i,\mathrm {nil}}(n)=F_i(g(n)\Gamma )$ for M-Lipschitz function $F_i: G/\Gamma \to \mathbb {C}$ ,

  2. (ii) $\|f_{i,\mathrm {sml}}\|_2\leqslant \epsilon $ ,

  3. (iii) $\|f_{i,\mathrm {unf}}\|_{U^{s+1}}\leqslant {1}/{\mathcal {F}(M)}$ , and

  4. (iv) the functions $f_{i,\mathrm {nil}}$ , $f_{i,\mathrm {sml}}$ and $f_{i,\mathrm {unf}}$ are $4$ -bounded.

The last piece that we need is a finitary, periodic version of Theorem 6.7.

Proposition 7.5. Let $t\in \mathbb {N}_+$ and $A,M,N\geqslant 2$ . Let $G/\Gamma $ be a filtered nilmanifold of complexity M. Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is an A-irrational, N-periodic polynomial sequence, $F:(G/\Gamma )^{t+1}\to \mathbb {C}$ is M-Lipschitz and $1$ -bounded, and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is a homogeneous polynomial progression. Then

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} F(g^P(x,y)\Gamma^{t+1}) = \int_{G^P/\Gamma^P} F + O_M(A^{-c_M}) \end{align*} $$

for some $c_M>0$ .

Proof of Proposition 7.5 using Theorem 6.7

Let $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ be A-irrational and N-periodic. We claim that g is $(A,Nk)$ -irrational for all sufficiently large $k\in \mathbb {N}_+$ . If not, then there exists $1\leqslant i\leqslant s$ such that for each $k\in \mathbb {N}_+$ there exists an ith-level character $\eta _{i,k}:G_i\to \mathbb {R}$ of complexity at most A satisfying $\|\eta _{i,k}(g_i)\|_{\mathbb {R}/\mathbb {Z}}< A/(Nk)^i$ . The N-periodicity of $g_i$ implies that $g_i^{N^i}\in \Gamma _i$ mod $G_{i+1}^\nabla $ [Reference Candela and SisaskCS12, Lemma 5.3]; hence $\eta _{i,k}(g_i)\in ({1}/{N^i})\mathbb {Z}$ . Thus, $\eta _{i,k}(g_i)\in \mathbb {Z}$ whenever $k^i>A$ . In particular, since we can take k arbitrarily large, there exists a non-trivial ith-level character $\eta _{i,k}$ of complexity at most A for which $\eta _{i,k}(g_i)\in \mathbb {Z}$ , contradicting the A-irrationality of g. Hence g is $(A,Nk)$ -irrational for all sufficiently large $k\in \mathbb {N}_+$ .

Applying Theorem 6.7, we deduce that

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} F(g^P(x,y)\Gamma^{t+1}) & = \mathop{\mathbb{E}}\limits_{x,y\in[Nk]} F(g^P(x,y)\Gamma^{t+1}) + O(1/k)\\ & = \int_{G^P/\Gamma^P} F + O_M(A^{-c_M}) + O(1/k) \end{align*} $$

for all sufficiently large $k\in \mathbb {N}_+$ . Taking $k\to \infty $ finishes the proof.

Theorem 7.2 is a special case of [Reference KucaKuc21b, Theorem 8.1], the proof of which is analogous to the derivation of Corollary 5.4 from Theorem 5.3. Here, we only sketch the steps taken in the derivation of [Reference KucaKuc21b, Theorem 8.1], and we refer the reader to [Reference KucaKuc21b] for all the details. First, we use Proposition 7.3 and Lemma 7.4 to replace the functions $f_0, \ldots , f_t$ by irrational, periodic nilsequences. Second, we use Proposition 7.5 to approximate the sum by an integral of some Lipschitz function F over $G^P/\Gamma ^P$ . Third, we use the fact that $\mathcal {A}_i(\vec {P})=s$ to conclude that $1^i \times G_{s+1}\times 1^{t_i}$ is a subgroup of $G^P$ . Fourth, we use disintegration theorem to bound $\int _{G^P/\Gamma ^P}$ by averages of some Lipschitz function $F_i$ over cosets of $G_{s+1}\Gamma $ . Fifth, we use the assumption that $f_i$ has a small $U^{s+1}$ norm to conclude that averages of $F_i$ over cosets of $G_{s+1}\Gamma $ are small. From this follows the smallness of

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}}f_0(x) f_1(x+P_1(y))\cdots f_t(x+P_t(y)). \end{align*} $$

The proof of [Reference KucaKuc21b, Theorem 8.1] makes this argument precise and illustrates how all the error quantities are taken care of.

Finally, Proposition 7.5 together with [Reference KucaKuc21b, Theorem 9.1] implies part (i) of Corollary 1.15.

8 The proof of Theorem 6.7

To complete the proofs of Corollary 5.4 and Theorem 7.2, it remains to derive Theorem 6.7. Before we prove Theorem 6.7 for an arbitrary homogeneous progression, we want to deduce the theorem in the special case of $\vec {P} = (x, \; x+y,\; x+2y,\; x+y^3)$ . This will help illustrate the method, and we will later compare this progression with $(x, \; x+y,\; x+2y,\; x+y^2)$ to see what is failing in the inhomogeneous case. The method is an adaptation of the proof of Theorem 1.11 from [Reference Green, Tao, Bárány, Solymosi and SágiGT10]; however, the linear algebraic component coming from the fact that we are dealing with polynomial progressions is much more involved. The method used here is somewhat similar to the methods used in [Reference KucaKuc21b]; here, however, we perform downward induction on the degree of subgroups $G_i$ , whereas in [Reference KucaKuc21b] we perform downward induction on the degree of monomials in $\eta \circ g^P$ .

Proposition 8.1. Let $A,M,N\geqslant 2$ . Let $G/\Gamma $ be a filtered nilmanifold of degree $2$ and complexity M. Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is an $(A,N)$ -irrational sequence satisfying $g(0)=1$ , $F:(G/\Gamma )^{t+1}\to \mathbb {C}$ is M-Lipschitz and $\vec {P} = (x, \; x+y,\; x+2y,\; x+y^3)$ . Then

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x,y\in[N]} F(g^P(x,y)\Gamma^4) = \int_{G^P/\Gamma^P} F + O_M(A^{-c_M}) \end{align*} $$

for some $c_M>0$ .

The assumption that G has a filtration of degree 2 is made to simplify the exposition, and because all the difficulties that emerge in higher-step cases are already present here.

We shall need the following lemma.

Lemma 8.2. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be a homogeneous polynomial progression, $\epsilon>0$ , and $s, N\in \mathbb {N}_+$ . Let $W_i\leqslant \mathbb {R}[x,y]$ be as defined in §4, and for each $1\leqslant i\leqslant s$ , let $Q_{i,1}, \ldots , Q_{i, t_i}$ be a basis for $W_i$ composed of integral polynomials. Suppose that $a_{ij}$ are real numbers such that the polynomial

$$ \begin{align*} Q(x,y) = \sum_{i=1}^s \sum_{j=1}^{t_i} a_{ij} Q_{i,j}(x,y) \end{align*} $$

satisfies $\|Q\|_{C^\infty [N]}\leqslant \epsilon $ . Then there exists a positive integer $q=O(1)$ with the property that $\|q a_{sj} \|_{\mathbb {R}/\mathbb {Z}}\ll \epsilon N^{-s}$ for all $1\leqslant j\leqslant t_s$ .

Proof. For $s\in \mathbb {N}_+$ , we let $W_s, V_s$ be as in §4. We also define

$$ \begin{align*} \tilde{W}_s = {{\mathrm{Span}}}_{\mathbb{R}}\{(x+P_i(y))^s: 0\leqslant i\leqslant t\}\quad{{\mathrm{and}}}\quad U_s = {{\mathrm{Span}}}_{\mathbb{R}}\bigg\{{{x}\choose{i}}{{y}\choose{j}}: i+j < s\bigg\}. \end{align*} $$

We want to show first that $\dim W_s/U_s = \dim W_s = t_s$ , that is, that the polynomials $Q_{s, 1}, \ldots , Q_{s, t_s}$ remain linearly independent when we subtract from them the monomials in the Taylor basis of degree less than s. While this claim may plausibly hold for any polynomial progression, we prove it for homogeneous progressions since this is the only case in which we need this result. The homogeneity of $\vec {P}$ implies that $W_s\cong V_s/V_{s-1}\cong \tilde {W}_s$ . Therefore $W_s/U_s\cong V_s/U_s V_{s-1} \cong \tilde {W}_s/ U_s\cong \tilde {W}_s,$ where the last isomorphism follows from the fact no polynomial in $\tilde {W}_s$ has a non-zero monomial of degree less than s. The claim $\dim W_s/U_s = t_s$ follows.

Let $Q(x,y) = \sum \nolimits _{k,l} c_{kl} {{x}\choose {k}}{{y}\choose {l}}$ and $\tilde {Q}(x,y) = \sum \nolimits _{k+l\geqslant s} c_{kl} {{x}\choose {k}}{{y}\choose {l}}$ . Thus, $\tilde {Q} = Q$ mod $U_s$ , and it satisfies $\|\tilde {Q}\|_{C^\infty [N]}\leqslant \epsilon $ . Setting $Q_{i,j}(x,y) = \sum \nolimits _{k,l} b_{klij} {{x}\choose {k}}{{y}\choose {l}}$ , we similarly let $\tilde {Q}_{i,j}(x,y) = \sum \nolimits _{k+l\geqslant s} b_{klij} {{x}\choose {k}}{{y}\choose {l}}$ . We deduce from $\dim W_k/U_k = t_k = \dim W_k$ that $\tilde {Q}_{s,1}$ , …, $\tilde {Q}_{s,t_s}$ are linearly independent.

From the definitions of Q and $b_{klij}$ it follows that $c_{kl} = \sum \nolimits _{i,j} b_{klij} a_{ij} $ , and that $\|c_{kl}\|_{\mathbb {R}/\mathbb {Z}}\leqslant \epsilon N^{-(k+l)}\leqslant \epsilon N^{-s}$ whenever $k+l\geqslant s$ .

Let u be the number of pairs $(k,l)$ with $k+l\geqslant s$ for which $c_{kl}\neq 0$ . The fact that $\dim W_s/U_s = t_s$ implies that $u\geqslant t_s$ . Indexing these pairs as $(k_1, l_1), \ldots , (k_u, l_u)$ in some arbitrary fashion, we obtain a $u\times s$ matrix $B = (b_{k_r l_r i j})_{r}$ as well as a $t_s$ -dimensional column vector $a = (a_{sj})_{j}$ and a u-dimensional column vector $c = (c_{k_r l_r})_{r}$ such that $Ba = c$ . The linear independence of $\tilde {Q}_{s,1}, \ldots , \tilde {Q}_{s,t_s}$ implies that there exist an invertible $t_s\times t_s$ submatrix $\tilde {B}$ of B and a $t_s$ -dimensional column vector $\tilde {c}$ such that $\tilde {B}a = \tilde {c}$ . Since the entries of $\tilde {B}$ are integers of size $O(1)$ , the entries of $\tilde {B}^{-1}$ are rational numbers of height $O(1)$ . Therefore, there exists a positive integer $q=O(1)$ for which the entries of the matrix $q\tilde {B}^{-1}$ are integers of size $O(1)$ . The equality $a = \tilde {B}^{-1}\tilde {c}$ and the condition $\|c_{kl}\|_{\mathbb {R}/\mathbb {Z}}\leqslant \epsilon N^{-s}$ whenever $k+l\geqslant s$ imply that $\|q a_{sj}\|_{\mathbb {R}/\mathbb {Z}}\ll \epsilon N^{-s}$ for $1\leqslant j\leqslant t_s$ , as claimed.

Proof of Proposition 8.1

Let $\vec {P} = (x, \; x+y,\; x+2y,\; x+y^3)$ . We set

$$ \begin{align*} \vec{v}_1 = (1,1,1,1),\quad \vec{v}_2 = (0,1,2,0),\quad \vec{v}_3 = (0,0,0,1)\quad\text{and}\quad \vec{v}_4 = (0,0,1,0) \end{align*} $$

and observe that

$$ \begin{align*} \vec{P}(x,y) &= \vec{v}_1 x + \vec{v}_2 y + \vec{v}_3 y^3,\\[3pt] {{\vec{P}(x,y)}\choose{2}} &= \vec{v}_1 {{x}\choose{2}}+ \vec{v}_2 \bigg(xy+{{y}\choose{2}}\bigg) + \vec{v}_3 \bigg(xy^3 + {{y^3}\choose{2}}\bigg) + \vec{v}_4 y^2. \end{align*} $$

Thus, we have

$$ \begin{align*} \mathcal{P}_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_1, \vec{v}_2, \vec{v}_3\} \quad {{\mathrm{and}}}\quad \mathcal{P}_2 = \mathcal{P}_3 = \cdots = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_1, \vec{v}_2, \vec{v}_3, \vec{v}_4\} = \mathbb{R}^4 \end{align*} $$

as well as

$$ \begin{align*} G^P = G^{\vec{v}_1} G^{\vec{v}_2} G^{\vec{v}_3} G_2^4, \end{align*} $$

where $H^{\vec {w}} = \langle h^{\vec {w}}: h\in H\rangle $ for any subgroup $H\leq G$ .

We shall prove Proposition 8.1 by applying Theorem 6.5. Suppose that $g^P$ is not $(c_M A^{-C_M},N)$ -equidistributed on $G^P/\Gamma ^P$ for some constants $0<c_M<1<C_M$ . By Theorem 6.5, there exists a non-trivial horizontal character ${\eta : G^P\to \mathbb {R}}$ of modulus at most $cA$ , for which $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ for some constant $c>0$ that depends on $c_M$ and $C_M$ . The constant $C_M$ is chosen in such a way as to match the exponents in case (ii) of Theorem 6.5. However, we have control over how we choose the constant $c_M$ , and we shall pick it small enough to show that $g^P$ not being $(c_M A^{-C_M},N)$ -equidistributed contradicts the $(A,N)$ -irrationality of g.

Rewriting the expression for $\eta \circ g^P$ , we see that

$$ \begin{align*} \eta\circ g^P(x,y) &= \eta(g_1^{\vec{v}_1}) x + \eta(g_1^{\vec{v}_2}) y + \eta(g_1^{\vec{v}_3}) y^3\\[3pt] &\quad+ \eta(g_2^{\vec{v}_1}) {{x}\choose{2}}+ \eta(g_2^{\vec{v}_2}) \bigg(xy+{{y}\choose{2}}\bigg) + \eta(g_2^{\vec{v}_3}) \bigg(xy^3 + {{y^3}\choose{2}}\bigg) + \vec{v}_4 y^2. \end{align*} $$

Applying Lemma 8.2 and the assumption $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ , and choosing $c_M$ in such a way that $c>0$ is sufficiently small, we deduce that there exists a positive integer $q=O(1)$ such that $\|q \eta (g_i^{\vec {v}_{j}})\|_{\mathbb {R}/\mathbb {Z}} < A N^{-i}$ for all pairs

$$ \begin{align*} (i,j)\in\{(1,1), (1,2), (1,3), (2,1), (2,2), (2,3), (2,4)\}. \end{align*} $$

We aim to show that $\eta $ is trivial by showing that it vanishes on all of $G^P$ . First, we want to show that $\eta $ vanishes on $G_2^4$ . Suppose that $\eta |_{G_2^4}\neq 0$ , and define $\xi _{2,1}:G_2\to \mathbb {R}$ by $\xi _{2,1}(h_2) =q \eta (h_2^{(1,1,1,1)})$ . We claim that $\xi _{2,1}$ is a second-level character. To prove this, we need to show that $\xi _{2,1}$ is a continuous group homomorphism, vanishes on $G_3$ , sends $(\Gamma _2)$ to $\mathbb {Z}$ , and vanishes on $[G_1,G_1]$ . The first statement follows from the fact that $\eta $ is a continuous group homomorphism, the second is true since $G_3$ is trivial, and the third follows from the fact that $q\in \mathbb {Z}$ , $\eta (\Gamma ^P)\leqslant \mathbb {Z}$ and $(1,1,1,1)\in \mathbb {Z}^4$ . To see the last statement, we note from $\vec {v}_1 \cdot \vec {v}_1 = \vec {v}_1$ , formula (C.2) in [Reference Green, Tao, Bárány, Solymosi and SágiGT10], and the two-step nilpotence of G that for any $h_1, h^{\prime }_1\in G_1$ ,

$$ \begin{align*} [h_1^{\vec{v}_1}, {h_1'}^{\vec{v}_1}] = [h_1, h_1']^{\vec{v}_1}. \end{align*} $$

Since $h_1^{\vec {v}_1}, {h_1'}^{\vec {v}_1}$ are both elements of $G^P$ , we have

$$ \begin{align*} \xi_{2,1}([h_1, h_1']) = \eta([h_1, h_1']^{\vec{v}_1}) = \eta([h_1^{\vec{v}_1}, {h_1'}^{\vec{v}_1}]) = 0, \end{align*} $$

implying that $\xi _{2,1}$ vanishes on $[G_1, G_1]$ . Thus, $\xi _{2,1}$ is a second-level character.

Performing a similar analysis while looking at the coefficients of ${{x}\choose {2}}, xy+{{y}\choose {2}}, xy^3 + {{y^3}\choose {2}}$ and $y^2$ respectively, we conclude that for all $1\leqslant j\leqslant 4$ , the maps $\xi _{2,j}(h_2) = q \eta (h_2^{\vec {v}_j})$ from $G_2$ to $\mathbb {R}$ are second-level characters. The non-triviality of $\eta $ on $G_2^4$ and the fact that $\vec {v}_1$ , $\vec {v}_2$ , $\vec {v}_3$ and $\vec {v}_4$ span $\mathcal {P}_2 = \mathbb {R}^4$ imply that for at least one value $1\leqslant i\leqslant 4$ , the character $\eta $ does not vanish on $G_2^{\vec {v}_i}$ . We fix this i. From $\|\xi _{2,i}(g_i)\|_{\mathbb {R}/\mathbb {Z}} = \|q \eta (g_i^{\vec {v}_{j}})\|_{\mathbb {R}/\mathbb {Z}} < A N^{-i}$ and the $(A,N)$ -irrationality of $g_2$ we deduce that $|\xi _{2,i}|>A$ . Together with the bounds $q=O(1)$ and $|\vec {v}_1|=O(1)$ , this implies that $|\eta |> c' A$ for some constant $c'>0$ . Choosing $c_M$ in such a way that $c<c'$ gives the desired contradiction. Hence $\eta $ vanishes on $G_2^4$ .

This leaves us with

$$ \begin{align*} \eta\circ g^P(x,y) &= \eta(g_1^{\vec{v}_1})x + \eta(g_1^{\vec{v}_2})y + \eta(g_1^{\vec{v}_3})y^3. \end{align*} $$

By analysing the coefficients of $x, y$ and $y^3$ as above, we see that $\eta $ vanishes on elements of the form $h_1^{\vec {v}_i}$ with $h_1\in G_1$ and $1\leqslant i\leqslant 3$ . Thus, $\eta $ vanishes on all of $G^P$ . This contradicts the non-triviality of $\eta $ , and so $g^P$ is $(c_M A^{-C_M},N)$ -equidistributed on $G^P/\Gamma ^P$ .

We now prove Theorem 6.7 in full generality.

Proof of Theorem 6.7

Let $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression, $G_{\bullet }$ be a filtration of degree s and $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ . By (33), we can find a family $\{Q_{i,j}:\; 1\leqslant i\leqslant s,\; 1\leqslant j\leqslant t_i\}$ of linearly independent integral polynomials such that $Q_{i,1}$ , …, $Q_{i, t_i}$ is a basis for $W_i = W^{\prime }_i$ for $1\leqslant i\leqslant s$ . It is crucial that these polynomials are linearly independent, which follows from homogeneity of $\vec {P}$ . For each i, let $\tau _i: W_i\to \mathcal {P}_i$ be the map associated with $Q_{i,1}$ , …, $Q_{i, t_i}$ as defined in §4. We also let $\vec {v}_{i,j}\in \mathbb {Z}^{t+1}$ be the vectors such that $\tau _i(Q_{i,j}) = \vec {v}_{i,j}$ .

As in the proof of Proposition 8.1, suppose that $g^P$ is not $(c_M A^{-C_M},N)$ -equidistributed on $G^P/\Gamma ^P$ for some constants $0<c_M<1<C_M$ . We apply Theorem 6.5 again to conclude that there exists a non-trivial horizontal character ${\eta : G^P\to \mathbb {R}}$ of modulus at most $cA$ satisfying $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ for some constant $c>0$ that depends on $c_M$ and $C_M$ . The constant $C_M$ is chosen in such a way as to match the exponents in case (ii) of Theorem 6.5, but the choice of $c_M$ is up to us again. We shall pick it small enough to show that the failure of $g^P$ to be $(c_M A^{-C_M},N)$ -equidistributed contradicts the $(A,N)$ -irrationality of g.

Thus,

$$ \begin{align*} \eta\circ g^P(x,y) = \sum_{i=1}^s\sum_{j=1}^{t_i}\eta(g_i^{\vec{v}_{i,j}}) Q_{i,j}(x,y). \end{align*} $$

Using Lemma 8.2 and the assumption $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ , and choosing $c_M$ in such a way that $c>0$ is sufficiently small, we deduce that there exists a positive integer $q=O(1)$ such that $\|q \eta (g_i^{\vec {v}_{i,j}})\|_{\mathbb {R}/\mathbb {Z}} < A N^{-i}$ for all $1\leqslant i\leqslant s$ and $1\leqslant j\leqslant t_i$ .

Our goal now is to show by downward induction on i that $\eta $ vanishes on the group

$$ \begin{align*}H_i = \langle h_i^{\vec{v}_{i,j}}: h_i\in G_i, 1\leqslant j\leqslant t_i\rangle\end{align*} $$

for all $i\in \mathbb {N}_+$ . This is trivially true for $i\geqslant s+1$ . Suppose that $\eta $ vanishes on $H_{i+1}$ for some $1\leqslant i\leqslant s$ but that it does not vanish on $H_i$ . We define the maps $\xi _{i,j}:G_i\to \mathbb {R}$ by $\xi _{i,j}(h_i) = \eta (q h_i^{\vec {v}_{i,j}})$ and claim that they are ith-level characters. They are continuous group homomorphisms because $\eta $ is, and they vanish on $G_{i+1}$ by the induction hypothesis. Since $q\in \mathbb {Z}$ and $\vec {v}_{i,j}$ have integer entries, we also have $\xi _{i,j}(\Gamma _i)\subseteq \mathbb {Z}$ . It remains to show that $\xi _{i,j}$ vanishes on $[G_l, G_{i-l}]$ for all $1\leqslant l < i$ . The fact that $\mathcal {P}_i\subseteq \mathcal {P}_l\cdot \mathcal {P}_{i-l}$ implies the existence of $\vec {u}_l\in \mathcal {P}_l$ and $\vec {u}_{i-l}\in \mathcal {P}_{i-l}$ for which $\vec {v}_{i,j}=\vec {u}_l\cdot \vec {u}_{i-l}$ , and so we have

$$ \begin{align*} [G_l^{\vec{u}_l}, G_{i-l}^{\vec{u}_{i-l}}] = [G_l, G_{i-l}]^{\vec{u}_l\cdot \vec{u}_{i-l}}\; {{\mathrm{mod}}}\; G_{i+1}^{t+1}, \end{align*} $$

from which it follows that $\xi _{i,j}|_{[G_l,G_{i-l}]}=0$ . Therefore each $\xi _{i,j}$ is an ith-level character.

The non-triviality of $\eta $ on $H_i$ and the fact that $\mathcal {P}_i$ is spanned by the vectors $\vec {v}_{i,1}$ , …, $\vec {v}_{i,t_i}$ imply that for at least one value $1\leqslant j\leqslant t_i$ , the character $\eta $ does not vanish on $G_i^{\vec {v}_{i,j}}$ , and so $\xi _{i,j}$ is non-trivial. From $\|\xi _{i,j}(g_i)\|_{\mathbb {R}/\mathbb {Z}} = \|q \eta (g_i^{\vec {v}_{i,j}})\|_{\mathbb {R}/\mathbb {Z}} < A N^{-i}$ and the $(A,N)$ -irrationality of $g_i$ we deduce that $|\xi _{i,j}|>A$ . Together with the bounds $q=O(1)$ and $|\vec {v}_{i,j}|=O(1)$ , this implies that $|\eta |> c' A$ for some constant $c'>0$ . We choose $c_M$ in such a way that $c<c'$ ; this contradicts the non-triviality of $\eta $ on $H_i$ . This proves the inductive step; hence $\eta $ vanishes on all of $G^P$ , contradicting the non-triviality of $\eta $ . It follows that $g^P$ is $(c_M A^{-C_M},N)$ -equidistributed on $G^P/\Gamma ^P$ .

9 The failure of Theorem 6.7 in the inhomogeneous case

Having derived Theorem 6.7, we want to show why an analogous statement fails in the inhomogeneous case. We let

(42) $$ \begin{align} \vec{P}(x,y) = (x, \; x+y,\; x+2y,\; x+y^2), \end{align} $$

with a square instead of a cube in the last position. It is an inhomogeneous progression because of the inhomogeneous relation (10). Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is an irrational polynomial sequence with $g(0)=1$ on a connected group G with a filtration $G_{\bullet }$ of degree 2. We shall try to show that $g^P$ is equidistributed on $G^P/\Gamma ^P$ the same way as we argued in Proposition 8.1, and we indicate where and why the argument fails.

Once again, we let

$$ \begin{align*} \vec{v}_1 = (1,1,1,1),\quad \vec{v}_2 = (0,1,2,0),\quad \vec{v}_3 = (0,0,0,1)\quad\text{and}\quad \vec{v}_4 = (0,0,1,0), \end{align*} $$

and we observe that $\mathcal {P}_1 = {{\mathrm{Span}}}_{\mathbb {R}}\{\vec {v}_1, \vec {v}_2, \vec {v}_3\}$ and $\mathcal {P}_2 = {{\mathrm{Span}}}_{\mathbb {R}}\{\vec {v}_1, \vec {v}_2, \vec {v}_3, \vec {v}_4\}$ . Hence $G^P = G^{\vec {v}_1} G^{\vec {v}_2} G^{\vec {v}_3} G_2^4.$ Suppose that $g^P$ is not $(c_M A^{-C_M},N)$ -equidistributed on $G^P/\Gamma ^P$ for some constants $0<c_M<1<C_M$ . Theorem 6.5 once again implies the existence of a non-trivial horizontal character ${\eta : G^P\to \mathbb {R}}$ of modulus at most $cA$ , for which $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ for some constant $c>0$ that depends on $c_M$ and $C_M$ .

Rewriting the expression for $\eta \circ g^P$ , we see that

$$ \begin{align*} \eta\circ g^P(x,y) &= \eta(g_1^{\vec{v}_1}) x + \eta(g_1^{\vec{v}_2}) y + \eta(g_1^{\vec{v}_3}) y^2\\[4pt] &\quad+ \eta(g_2^{\vec{v}_1}) {{x}\choose{2}}+ \eta(g_2^{\vec{v}_2}) \bigg(xy+{{y}\choose{2}}\bigg) + \eta(g_2^{\vec{v}_3}) \bigg(xy^2 + {{y^2}\choose{2}}\bigg) + \vec{v}_4 y^2\\[4pt] &= \eta(g_1^{\vec{v}_1}) x + \eta(g_1^{\vec{v}_2}) y + (\eta(g_1^{\vec{v}_3}) + \eta(g_2^{\vec{v}_4}))y^2\\[4pt] &\quad+ \eta(g_2^{\vec{v}_1}) {{x}\choose{2}}+ \eta(g_2^{\vec{v}_2}) \bigg(xy+{{y}\choose{2}}\bigg) + \eta(g_2^{\vec{v}_3}) \bigg(xy^2 + {{y^2}\choose{2}}\bigg). \end{align*} $$

Applying Lemma 8.2 and the assumption $\|\eta \circ g^P\|_{C^\infty [N]}\leqslant cA$ , and choosing $c_M$ in such a way that $c>0$ is sufficiently small, we deduce that there exists a positive integer $q=O(1)$ such that

(43) $$ \begin{align} \|q \eta(g_i^{\vec{v}_{j}})\|_{\mathbb{R}/\mathbb{Z}} < A N^{-i} \end{align} $$

for all pairs

$$ \begin{align*} (i,j)\in\{(1,1), (1,2), (2,1), (2,2), (2,3)\}. \end{align*} $$

By looking at the coefficient of ${{x}\choose {2}}$ , $xy+{{y}\choose {2}}$ and $xy^2 + {{y^2}\choose {2}}$ , we deduce that the maps

$$ \begin{align*} h_2\mapsto q\eta(h_2^{\vec{v}_1}),\; q\eta(h_2^{\vec{v}_2}),\; q\eta(h_2^{\vec{v}_3}) \end{align*} $$

are trivial second-level characters; the argument goes exactly the same way as in the proof of Proposition 8.1. Thus, $\eta $ vanishes on all elements of the form $h_2^{\vec {w}_2}$ with $h_2\in G_2$ and

$$ \begin{align*} \vec{w}_2 \in \mathcal{P}^{\prime}_2 = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_1, \vec{v}_2, \vec{v}_3\}. \end{align*} $$

By looking at the coefficients of x and y, we similarly show that $\eta $ vanishes on all elements of the form $h_1^{\vec {w}_1}$ with $h_1\in G_1$ and

$$ \begin{align*} \vec{w}_1 \in \mathcal{P}^{\prime}_1 = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_1, \vec{v}_2\}. \end{align*} $$

We are left with

$$ \begin{align*} \eta\circ g^P(x,y) = (\eta(g_1^{\vec{v}_3})+ \eta(g_2^{\vec{v}_4}))y^2. \end{align*} $$

We would like to be able to say that $\eta $ vanishes on all elements of the form $h_1^{\vec {w}_1}$ and $h_2^{\vec {w}_2}$ with $h_i \in G_i$ and $\vec {w}_i \in \mathcal {P}_i$ ; this would imply that $\eta $ is trivial. For this to be the case, it would suffice to show that both $\eta (g_1^{\vec {v}_3})$ and $\eta (g_2^{\vec {v}_4})$ satisfy estimate (43), and then use $(A,N)$ -irrationality of $g_1$ and $g_2$ to conclude that the characters $h_1\mapsto q \eta (h_1^{\vec {v}_3})$ and $h_2\mapsto q\eta (h_2^{\vec {v}_4})$ are trivial. Alas, this need not be true. In Proposition 8.1, the number $\eta (h_1^{\vec {v}_3})$ was the coefficient of $y^3$ while $\eta (h_2^{\vec {v}_4})$ was the coefficient of $y^2$ , from which it followed that they both satisfied (43). Now, however, all we can show is that

(44) $$ \begin{align} \|q (\eta(g_1^{\vec{v}_3}) + \eta(g_2^{\vec{v}_4}))\|_{\mathbb{R}/\mathbb{Z}} < A N^{-1} \end{align} $$

because $\eta (g_1^{\vec {v}_3}) + \eta (g_2^{\vec {v}_4})$ is the coefficient of $y^2$ . But it need not follow that either of $\eta (g_1^{\vec {v}_3})$ and $\eta (g_2^{\vec {v}_4})$ satisfies (43); in particular, $g^P$ may take values in a proper rational subgroup of $G^P$ .

We illustrate this with a specific example, akin to the example in [Reference KucaKuc21b, § 11]. Suppose that $G = G_1 = \mathbb {R}^2$ , $G_2 = 0\times \mathbb {R}$ , $G_3 = 0\times 0$ . The sequence $g(n) = (a n, b {{n}\choose {2}})$ is adapted to the filtration $G_{\bullet }$ , and it is irrational if and only if a and b are irrational. We identify $G^4$ with $\mathbb {R}^8$ via the map

$$ \begin{align*} G^4 &\to\mathbb{R}^8\\ ((x_1, y_1), (x_2, y_2), (x_3, y_3), (x_4, y_4)) &\mapsto (x_1, x_2, x_3, x_4, y_1, y_2, y_3, y_4). \end{align*} $$

Setting

$$ \begin{align*} \vec{v}_{11} &= \vec{e}_1 + \vec{e}_2 + \vec{e}_3 + \vec{e}_4, \quad \vec{v}_{12} = \vec{e}_2 + 2\vec{e}_3, \quad \vec{v}_{13} = \vec{e}_4,\\ \vec{v}_{21} &= \vec{e}_5+\vec{e}_6+\vec{e}_7+\vec{e}_8,\quad \vec{v}_{22} = \vec{e}_6 + 2\vec{e}_7,\quad \vec{v}_{23} = \vec{e}_8,\quad \vec{v}_{24} = \vec{e}_7, \end{align*} $$

we observe that $G^P = {{\mathrm{Span}}}_{\mathbb {R}}\{\vec {v}_{11}, \vec {v}_{12}, \vec {v}_{13}, \vec {v}_{21}, \vec {v}_{22}, \vec {v}_{23}, \vec {v}_{24}\}$ .

With these definitions, the coefficient of $y^2$ in $g^P$ becomes $ a\vec {v}_{13} + b\vec {v}_{24} = a\vec {e}_4 + b\vec {e}_7$ . If $a, b, 1$ are rationally independent, then the closure of $g^P$ is the image of the seven-dimensional subspace $G^P$ in $(\mathbb {R}/\mathbb {Z})^8$ . If a and b are rationally dependent, then the closure of $g^P$ is the image in $(\mathbb {R}/\mathbb {Z})^8$ of the six-dimensional subspace

$$ \begin{align*}\tilde{G} = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_{11}, \vec{v}_{12}, a\vec{v}_{13} + b\vec{v}_{24}, \vec{v}_{21}, \vec{v}_{22}, \vec{v}_{23}\}.\end{align*} $$

Finally, if some rational linear combination of a and b is a rational number $q/r$ in its lowest terms with $r>1$ , then the closure of $g^P$ is a union of at most r translates of a six-dimensional subtorus of $G^P/\Gamma ^P$ . For instance, if $a = \sqrt {2}$ and $b = \sqrt {2}+\tfrac 13$ , then we define

(45) $$ \begin{align} \tilde{G} = {{\mathrm{Span}}}_{\mathbb{R}}\{\vec{v}_{11}, \vec{v}_{12}, \vec{v}_{13} + \vec{v}_{24}, \vec{v}_{21}, \vec{v}_{22}, \vec{v}_{23}\}, \end{align} $$

and observe that the sequences $g^P_0, g^P_1, g^P_2$ defined by $g^P_i(x,y) = g^P(x, 3y + i)$ are equidistributed on $\tilde {G}/\tilde {\Gamma }$ , $\tfrac 13\vec {v}_{24} + \tilde {G}/\tilde {\Gamma }$ and $\tfrac 13\vec {v}_{24} + \tilde {G}/\tilde {\Gamma }$ , respectively. In particular, for inhomogeneous progressions it is not true that the group $\tilde {G}$ depends only on the filtration $G_{\bullet }$ and the progression $\vec {P}$ .

While annihilating the coefficients of $\eta \circ g^P$ , we were able to deal with the coefficients of x and y as well as ${{x}\choose {2}}$ , $xy+{{y}\choose {2}}$ and $xy^2 + {{y^2}\choose {2}}$ , which span the spaces $W_1'$ and $W_2'$ , respectively. The problematic coefficient was that of $y^2$ , belonging to the space $W^c$ . We have remarked below (34) in §4 that the non-triviality of the subspace $W^c$ prevents us from running the same argument as in Proposition 8.1 and Theorem 6.7 for inhomogeneous progressions; the problem with the coefficient of $y^2$ that we have encountered here illustrates this point. The reader should see from here how to generalize the aforementioned example to other inhomogeneous progression; this generalized construction proves part (ii) of Theorem 1.17.

10 Finding closure in the inhomogeneous case

Section 9 shows that we cannot always hope for the sequence $g^P$ to equidistribute in $G^P/\Gamma ^P$ for an inhomogeneous progression $\vec {P}$ . Here, we provide an inductive recipe for finding the closure of $g^P$ in the case of $\vec {P}(x,y) = (x,\; x+y,\; x+2y,\; x+y^2)$ . We believe that this argument could be generalized to arbitrary inhomogeneous progressions; while trying to do so, however, we have encountered significant technical issues of linear algebraic nature that we have not been able to overcome.

Since the argument that we present here is already complicated enough, we prove it in an infinitary setting so as to avoid confusion arising from various quantitative parameters. In effect, we show the following proposition.

Proposition 10.1. Let G be a connected group with filtration $G_{\bullet }$ of degree s, and $\vec {P}(x,y) = (x,\; x+y,\; x+2y,\; x+y^2)$ . Suppose that $g\in {{\mathrm{poly}}}(\mathbb {Z}, G_{\bullet })$ is irrational. There exist a subgroup $\tilde {G}\leqslant G^P$ and a decomposition $g^P = \tilde {g}\gamma $ , where $\tilde {g}$ takes values in $\tilde {G}$ and is equidistributed on $\tilde {G}/\tilde {\Gamma }$ whereas $\gamma $ is periodic. Moreover, the group $\tilde {G}$ contains the subgroup

$$ \begin{align*} K = \langle h_i^{\vec{w}_i}: h_i\in G_i, \vec{w}_i\in \mathcal{P}^{\prime}_i, 1\leqslant i\leqslant s\rangle, \end{align*} $$

where

$$ \begin{align*} \mathcal{P}_1' &= {{\mathrm{Span}}}_{\mathbb{R}}\{(1,1,1,1), (0,1,2,0)\}, \\ \mathcal{P}_2' &= {{\mathrm{Span}}}_{\mathbb{R}}\{(1,1,1,1), (0,1,2,0), (0,0,0,1)\},\\ \mathcal{P}_3' &= \mathcal{P}_4' = \cdots = \mathbb{R}^4. \end{align*} $$

We will need the following lemma, which is similar in spirit to Lemma 8.2.

Lemma 10.2. Let $a_1, \ldots , a_s$ be non-zero real numbers. Let $Q_1, \ldots , Q_s\in \mathbb {Q}[x,y]$ be linearly independent polynomials, and suppose that $Q = a_1 Q_1 + \cdots + a_s Q_s$ takes values in $\mathbb {Q}$ . Then $a_i\in \mathbb {Q}$ for all $1\leqslant i\leqslant s$ .

Proof. Let $b_{kl i}$ be the coefficient of ${{x}\choose {k}}{{y}\choose {l}}$ in $Q_i$ . Then

$$ \begin{align*}Q(x, y) = \sum_{k, l} \bigg(\sum_{i=1}^s a_i b_{kli}\bigg) {{x}\choose{k}}{{y}\choose{l}}.\end{align*} $$

The coefficient

$$ \begin{align*} c_{kl} = a_1 b_{kl1} + \cdots + a_s b_{kls} \end{align*} $$

of ${{x}\choose {k}}{{y}\choose {l}}$ in Q is rational, which can be seen as follows: there exists an integer $q>0$ such that $q Q\in \mathbb {Z}[x,y]$ , and hence $q c_{kl}\in \mathbb {Z}$ by the classical fact that each integral polynomial is an integral linear combination of the Taylor monomials ${{x}\choose {k}}{{y}\choose {l}}$ . Indexing the pairs $(k_1, l_1), \ldots , (k_u, l_u)$ in some arbitrary fashion, we obtain a $u\times s$ matrix $B = (b_{k_r l_r i})_{r i}$ as well as an s-dimensional column vector $a = (a_{i})_{i}$ and a u-dimensional column vector $c = (c_{j_l k_l})_{l}$ such that $Ba = c$ . The linear independence of $Q_1, \ldots , Q_r$ implies that B has full rank, and so there exist an invertible $s\times s$ submatrix $\tilde {B}$ of B and an s-dimensional column vector $\tilde {c}$ such that $\tilde {B}a = \tilde {c}$ . Since the entries of $\tilde {B}$ are integers, the entries of $\tilde {B}^{-1}$ are rational numbers. The equality $a = \tilde {B}^{-1}\tilde {c}$ then implies that $a_i\in \mathbb {Q}$ for each $1\leqslant i\leqslant s$ .

Proof of Proposition 10.1

For each $i\geqslant 3$ , we find a basis $\{Q_{i,1}, Q_{i,2}, Q_{i,3}, Q_{i,4}\}$ for $W_i$ . The absence of an inhomogeneous algebraic relation of degree 3 or higher implies that

$$ \begin{align*}\sum_{i=3}^s W_i = \bigoplus_{i=3}^s W_i,\end{align*} $$

from which it follows that the set $\{Q_{i,j}: 3\leqslant i\leqslant s,\; 1\leqslant j\leqslant 4\}$ is linearly independent. For $3\leqslant i\leqslant s$ and $1\leqslant j\leqslant 4$ , we let $\vec {v}_{i,j}=\tau _i(Q_{i,j})$ . We also set

$$ \begin{align*} \vec{v}_1 = (1,1,1,1),\quad \vec{v}_2 = (0,1,2,0),\quad \vec{v}_3 = (0,0,0,1) \quad {{\mathrm{and}}}\quad \vec{v}_4 = (0,0,1,0). \end{align*} $$

We want to find a subgroup $\tilde{G}$ of $G^P$ on which we can guarantee equidistribution. Starting with

$$ \begin{align*}H^{(1)} = \langle h_1^{\vec{v}_3},\; h_2^{\vec{v}_4}: h_1\in G_1, h_2\in G_2\rangle,\end{align*} $$

we inductively define a chain of subgroups

$$ \begin{align*} H^{(1)}\geqslant H^{(2)}\geqslant H^{(3)}\geqslant \cdots \end{align*} $$

as well as groups $ G^{(k)} = \langle K, H^{(k)}\rangle $ and $\Gamma ^{(k)}=\Gamma ^P\cap G^{(k)}$ . We note that $G^{(1)}=G^P\!$ .

We also inductively define sequences $g^{(k)}$ and $h^{(k)}$ , starting with $h^{(1)}(y) = {g_1^{\vec {v}_3}}^{y^2}{g_2^{\vec {v}_4}}^{y^2}$ and $g^{(1)} = g^P$ . If $g^{(k)}$ is equidistributed in $G^{(k)}/\Gamma ^{(k)}$ , then we terminate the procedure. Otherwise Theorem 2.5 implies the existence of a non-trivial horizontal character $\eta ^{(k)}:G^{(k)}\to \mathbb {R}$ that vanishes on all of $G^{(k)}$ except $H^{(k)}$ , and for which $\eta ^{(k)}\circ g^{(k)} =\eta ^{(k)}\circ h^{(k)}$ takes values in $\mathbb {Z}$ . We then take $G^{(k+1)}=\ker \eta ^{(k)}$ and $H^{(k+1)}=\ker \eta ^{(k)}|_{H^{(k)}}$ , and we factorize $h^{(k)} = h^{(k+1)}\gamma ^{(k+1)}$ using an infinitary version of [Reference Green and TaoGT12, Proposition 9.2], where $\eta ^{k+1}\circ h^{(k+1)} = 0$ and $\gamma ^{(k+1)}$ is periodic. We define

$$ \begin{align*} g^{(k+1)}(x,y) = g^{(k)}(x,y)(\gamma^{(k+1)}(y))^{-1} \end{align*} $$

and observe that

$$ \begin{align*} &g^{(k+1)}(x,y)\\ &\quad= g_1^{{\vec{v}_1}x+\vec{v}_2 y} h^{(k+1)}(y) g_2^{{\vec{v}_1}{{x}\choose{2}} + \vec{v}_2(xy+{{y}\choose{2}}) + \vec{v}_3(xy^2 +{{y^2}\choose{2}})} \prod_{i=3}^s \prod_{j=1}^4 g_i^{\vec{v}_{i,j} Q_{i,j}} \; {{\mathrm{mod}}} \; [G_1, G_2]^4. \end{align*} $$

The sequence $g^{(k+1)}$ takes values in $G^{(k+1)}$ . We also write

$$ \begin{align*} h^{(k)}(y) = a^{(k)}(y)^{\vec{v}_4}b^{(k)}(y)^{\vec{v}_3}, \end{align*} $$

with $a^{(k)}$ being $G_2$ -valued and $b^{(k)}$ being $G_1$ -valued. Letting $a^{(k)}(y) = \prod \nolimits _{i=1}^s {a^{(k)}_i}^{{y}\choose {i}}$ , and similarly for $b^{(k)}$ , we claim that $a^{(k)}_2$ and $b^{(k)}_2$ are irrational elements of $G_2$ and $G_1$ respectively with regard to the filtration $G_{\bullet }$ on G. Finally, we claim that

$$ \begin{align*} H^{(k)} = G_2^{\vec{v}_4} \operatorname{mod} G_1^{\vec{v}_3}\quad {{\mathrm{and}}}\quad H^{(k)} = G_1^{\vec{v}_3} \operatorname{mod} G_2^{\vec{v}_4}. \end{align*} $$

First, we observe that all these properties hold at $k=1$ . We assume that they hold for some $k\geqslant 1$ , from which we aim to deduce that they also hold at the $(k+1)$ th level.

If $g^{(k)}$ is equidistributed in $G^{(k)}/\Gamma ^{(k)}$ , then we are done. Otherwise there exists a non-trivial horizontal character $\eta ^{(k)}:G^{(k)}\to \mathbb {R}$ for which $\eta ^{(k)}\circ g^{(k)}$ is $\mathbb {Z}$ -valued. We have

$$ \begin{align*} \eta^{(k)}\circ g^{(k)}(x,y) &= \eta^{(k)}(g_1^{{\vec{v}_1}})x + \eta^{(k)}(g_1^{\vec{v}_2})y + \eta^{(k)}(h^{(k)}(y)) \\[3pt] &\quad {{\kern-1pt}+{\kern-1pt}} \eta^{(k)}(g_2^{{\vec{v}_1}}){{x}\choose{2}} {{\kern-1pt}+{\kern-1pt}} 2\eta^{(k)}(g_2^{\vec{v}_2}) \bigg(xy+{{y}\choose{2}}\bigg) {{\kern-1pt}+{\kern-1pt}} \eta^{(k)}(g_2^{\vec{v}_3})\bigg(xy^2 +{{y^2}\choose{2}}\bigg)\\[3pt] &\quad+\sum_{i=3}^k \sum_{j=1}^4 \eta^{(k)}(g_i^{\vec{v}_{i,j}}) Q_{i,j}(x,y). \end{align*} $$

By looking at the coefficients of $Q_{i,j}$ for $3\leqslant i\leqslant s$ , applying Lemma 10.2, and following the same method as in the proof of Theorem 6.7, we see that $\eta ^{(k)}$ vanishes on elements of the form $h_i^{\vec {v}_{i,j}}$ for $h_i\in G_i$ , $3\leqslant i\leqslant s$ and $1\leqslant j\leqslant 4$ , and so $\eta ^{(k)}$ vanishes on all of $G_3\times G_3\times G_3\times G_3$ . This leaves us with

$$ \begin{align*} \eta^{(k)}\circ g^{(k)}(x,y) &= \eta^{(k)}(g_1^{{\vec{v}_1}})x + \eta^{(k)}(g_1^{\vec{v}_2})y + \eta^{(k)}(h^{(k)}(y)) \\[3pt] &\quad {{\kern-1pt}+{\kern-1pt}} \eta^{(k)}(g_2^{{\vec{v}_1}}){{x}\choose{2}} {{\kern-1pt}+{\kern-1pt}} 2\eta^{(k)}(g_2^{\vec{v}_2}) \bigg(xy+{{y}\choose{2}}\bigg) {{\kern-1pt}+{\kern-1pt}} \eta^{(k)}(g_2^{\vec{v}_3})\bigg(xy^2 +{{y^2}\choose{2}}\bigg). \end{align*} $$

We now carry on. By looking at the coefficient of ${{x}\choose {2}}$ and $xy+{{y}\choose {2}}$ , we see that $\eta ^{(k)}(g_2^{{\vec {v}_1}})$ and $\eta ^{(k)}(g_2^{\vec {v}_2})$ are both integers, and so $\eta ^{(k)}$ vanishes on all elements of the form $h_2^{\vec {v}_1}$ and $h_2^{\vec {v}_2}$ with $h_2\in G_2$ . By looking at the coefficients of x and y, we can similarly show that $\eta ^{(k)}$ vanishes on all elements of the form $h_1^{\vec {v}_1}$ and $h_1^{\vec {v}_2}$ with $h_1\in G_1$ . We are thus left with

$$ \begin{align*} \eta^{(k)}\circ g^{(k)}(x,y) = \eta^{(k)}(h^{(k)}(y)) + \eta^{(k)}(g_2^{\vec{v}_3})\bigg(xy^2 +{{y^2}\choose{2}}\bigg). \end{align*} $$

We first deal with the last term. Since $H^{(k)} = G_1^{\vec {v}_3}$ mod $G_2^{\vec {v}_3}$ , we have $[H^{(k)}, H^{(k)}] = [G_1^{\vec {v}_3}, G_1^{\vec {v}_3}]$ mod $G_3^4$ . Using the fact that $\eta ^{(k)}$ vanishes on both $G_3^4$ and $[H^{(k)}, H^{(k)}]$ , we deduce that it also vanishes on $[G_1^{\vec {v}_3}, G_1^{\vec {v}_3}]$ . Hence the function ${\xi _{2,3}: G_2\to \mathbb {R}}$ given by $\xi _{2,3}(h) = \eta ^{(k)}(h^{\vec {v}_3})$ is a second-level character. By irrationality of $g_2$ , it follows that $\xi _{2,3}$ is trivial, and so $\eta ^{(k)}$ vanishes on $G_2^{\vec {v}_3}$ . We have thus proved that $\eta ^{(k)}$ vanishes on all of $G^{(k)}$ except $H^{(k)}$ , and consequently that $\eta ^{(k)}\circ g^{(k)} = \eta ^{(k)}\circ h^{(k)}$ .

We now show that

(46) $$ \begin{align} H^{(k+1)} = G_2^{\vec{v}_4} \operatorname{mod} G_1^{\vec{v}_3}. \end{align} $$

Suppose not; let U be a proper rational subgroup of $G_2^{\vec {v}_4}$ such that

$$ \begin{align*} H^{(k+1)} = U \operatorname{mod} G_1^{\vec{v}_3}. \end{align*} $$

Then

$$ \begin{align*} H^{(k+1)}\leqslant UG_1^{\vec{v}_3} \cap H^{(k)} \leqslant H^{(k)}. \end{align*} $$

We know from the rank-nullity theorem that $\dim H^{(k+1)} = \dim H^{(k)} - 1$ , and we have $H^{(k)} = G_2^{\vec {v}_4} \operatorname {mod} G_1^{\vec {v}_3}$ from the inductive hypothesis. These two facts, together with the assumption that U is a proper rational subgroup of $G_2^{(0,0,1,0)}$ , imply that $H^{(k+1)} = UG_1^{\vec {v}_3} \cap H^{(k)}$ . It follows that

$$ \begin{align*} \eta^{(k)}\circ g^{(k)}(x,y) = \eta^{(k)}(a^{(k)}(y)^{\vec{v}_4})+\eta^{(k)}(b^{(k)}(y)^{\vec{v}_3})= \eta^{(k)}(a^{(k)}(y)^{\vec{v}_4}). \end{align*} $$

We have already shown that $\eta ^{(k)}$ vanishes on $G_3^4$ . From the fact that $a^{(k)}(y) = \prod _{i=1}^s {a^{(k)}_i}^{{{y}\choose {i}}}$ with $a^{(k)}_i\!\!\in G_i$ , we deduce that $\eta ^{(k)}({a^{(k)}(y)}^{\vec {v}_4}) = \eta ^{(k)}(a^{(k)}_1) y + \eta ^{(k)}(a^{(k)}_2){{y}\choose {2}}$ . The map $\xi _{2,4}(h_2) = \eta ^{(k)}(h_2^{\vec {v}_4})$ is a continuous group homomorphism on $G_2$ that vanishes on $G_3$ and sends $\Gamma _2$ to $\mathbb {Z}$ . Since $\vec {v}_4 = (\vec {v}_2\cdot \vec {v}_2 - \vec {v}_2)/2$ , we also have

$$ \begin{align*} \xi_{2,4}([h_1, h_1']) = \tfrac{1}{2}\eta^{(k)}([h_1^{\vec{v}_2},{h^{\prime}_1}^{\vec{v}_2}]) - \tfrac{1}{2}\eta^{(k)}([h_1, h^{\prime}_1]^{\vec{v}_2}), \end{align*} $$

for any $h_1, h^{\prime }_1\in G_1$ , and so $\xi _{2,4}$ vanishes on $[G_1, G_1]$ . Thus $\xi _{2,4}$ is a second-level character on $G_2$ with respect to the filtration $G_{\bullet }$ on G, and since $a^{(k)}_2$ is an irrational element of $G_2$ with respect to this filtration, it follows that $\eta ^{(k)}$ is trivial, a contradiction; hence (46) holds. The argument that

$$ \begin{align*} H^{(k+1)} = G_1^{\vec{v}_3} \operatorname{mod} G_2^{\vec{v}_4} \end{align*} $$

is similar.

Finally, we factorize $h^{(k)} = h^{(k+1)}\gamma ^{(k+1)}$ , where $\gamma ^{(k+1)}$ is periodic and $h^{(k+1)}$ takes values in $H^{(k+1)} = \ker \eta ^{(k+1)}$ . It remains to show that $a^{(k+1)}_2$ and $b^{(k+1)}_2$ are irrational elements of $G_2$ and $G_1$ with respect to the filtration $G_{\bullet }$ on G. We observe that

$$ \begin{align*} a^{(k)} = a^{(k+1)} \gamma_a^{(k+1)} \quad {{\mathrm{and}}} \quad b^{(k)} = b^{(k+1)} \gamma_b^{(k+1)} \end{align*} $$

for some periodic sequences $\gamma _a$ and $\gamma _b$ taking values in $G_2$ and $G_1$ , respectively. Suppose that $\xi : G_2\to \mathbb {R}$ is a second-level character with respect to the filtration $G_{\bullet }$ , for which $\xi (a^{(k+1)}_2)\in \mathbb {Z}$ . The sequence $\gamma _a^{(k+1)}$ is periodic, hence $\xi \circ \gamma _a^{(k+1)}$ is $\mathbb {Q}$ -valued, and so it follows that $\xi (a^{(k)}_2)\in \mathbb {Q}$ as well. Therefore there exists an integer $l>0$ such that $l\xi (a^{(k)}_2)\in \mathbb {Z}$ . Since $\xi ':= l\cdot \xi $ is also a second-level character, it follows from the irrationality of $a^{(k)}_2$ that $\xi '$ is trivial. This implies that $\xi $ is trivial as well, and hence $a^{(k+1)}_2$ is irrational. The argument showing that $b^{(k+1)}_2$ is irrational is identical.

We have thus shown inductively that $g^{(k)}$ , $h^{(k)}$ , $G^{(k)}$ and $H^{(k)}$ satisfy all the properties we want them to satisfy for all $k\geqslant 1$ . Since $0\leqslant \dim G^{(k+1)}<\dim G^{(k)}$ , the procedure eventually terminates, at which point the sequence $g^{(k)}$ takes values in $G^{(k)}$ and is equidistributed on $G^{(k)}/\Gamma ^{(k)}$ . Letting $\tilde {G} = G^{(k)}$ for this value of k and $\gamma = \gamma ^{(k)} \ldots \gamma ^{(1)}$ , and observing that a product of periodic sequences is periodic, we finish the proof.

11 The equivalence of Weyl and algebraic complexity

While we are not able to show that Host–Kra and true complexities equal algebraic complexity for inhomogeneous progression, we can show the equivalence of Weyl and algebraic complexities for all integral progressions.

Definition 11.1. (Weyl system)

A Weyl system is an ergodic system $(X, \mathcal {X}, \mu , T)$ , where X is a compact abelian Lie group and T is a unipotent affine transformation on X, that is, $Tx = \phi (x) + a$ for $a\in X$ and an automorphism $\phi $ of X satisfying $(\phi - \mathrm {Id}_X)^s = 0$ for some $s\in \mathbb {N}_+$ .

We recall that an integral polynomial progression $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ has Weyl complexity s at $0\leqslant i\leqslant t$ if s is the smallest natural number for which the factor $\mathcal {Z}_s$ is characteristic for the weak convergence of $\vec {P}$ at i for any Weyl system.

Every disconnected Weyl system can be written as a finite union of isomorphic tori that are cyclically permuted by the transformation T, much the same way as each disconnected nilsystem is a union of connected nilsystems (cf. Proposition 2.2 and the remark below [Reference Bergelson, Leibman and LesigneBLL07, Theorem 3.5]). Therefore we can restrict our attention to connected Weyl systems. These can in turn be reduced to standard Weyl systems, which are totally ergodic by Proposition 2.2. Throughout this section we let $\mathbb {T} = \mathbb {R}/\mathbb {Z}$ .

Definition 11.2. (Standard Weyl system of order s)

Let $s\in \mathbb {N}_+$ and $X = \mathbb {T}^s$ . A standard Weyl system of order s is a system $(X,\mathcal {X},\mu ,T)$ , where $\mathcal {X}$ is the Borel $\sigma $ -algebra on X, $\mu $ is the Lebesgue measure, and

$$ \begin{align*} T(a_1, \ldots, a_s) = (a_1 + a_0, a_2 + a_1, \ldots, a_s + a_{s-1}) \end{align*} $$

for some irrational $a_0$ .

Proposition 11.3. [Reference Frantzikinakis and KraFK05, Lemma 4.1]

Each connected Weyl system is a factor of a product of several standard Weyl systems.

Determining Weyl complexity therefore amounts to analysing standard Weyl systems. Since each standard Weyl system is totally ergodic, we immediately deduce the following proposition.

Proposition 11.4. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Then ${\mathcal {W}_i(\vec {P})\leqslant \mathcal {HK}_i(\vec {P})}$ for all $0\leqslant i\leqslant t$ .

We now fix a standard Weyl system $(X,\mathcal {X},\mu ,T)$ of order s with some irrational $a_0$ . Then

(47) $$ \begin{align} T^n(a_1, \ldots, a_s) &= \bigg(a_1 + n a_0, a_2 + n a_1 + {{n}\choose{2}}a_0, \ldots, a_s + n a_{s-1} + \cdots + {{n}\choose{s}}a_0\bigg)\nonumber\\ &= g_0 + g_1 n + \cdots + g_s{{n}\choose{s}}, \end{align} $$

where $g_i = (a_{1-i}, \ldots , a_{s-i})$ and $a_{-k}=0$ for $k>0$ . For almost all points $a=(a_1, \ldots , a_s)\in \mathbb {R}^s$ , the numbers $1, a_0, \ldots , a_s$ are rationally independent, and we fix a point $a\in \mathbb {R}^s$ for which this is the case. The sequence $g(n) = T^n a$ is adapted to the filtration $G_i = \{0\}^{i-1} \times \mathbb {R}^{s-i+1}$ for $1\leqslant i\leqslant s$ and $G_i = 0$ for $i>s$ on $G = G_0 = \mathbb {R}^s$ , and it is irrational due to the irrationality of $a_0$ . Since the $\mathcal {Z}_i$ factor of X consists of all the functions whose values depend only on the first i coordinates, we have $Z_i = G/G_{i+1}\Gamma = \mathbb {T}^i \times \{0\}^{s-i}$ , where $\Gamma = \mathbb {Z}^s$ .

What we therefore aim to show is the following proposition.

Proposition 11.5. Let $t\in \mathbb {N}_+$ , $(X, \mathcal {X}, \mu , T)$ be a standard Weyl system of order s and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Fix $0\leqslant i\leqslant t$ and suppose that $\mathcal {A}_i(\vec {P}) = s'$ . Then the image of the group $\{0\}^{i}\times G_{s'+1}\times \{0\}^{t-i}$ is contained in the closure of $g^P$ inside $(G/\Gamma )^{t+1}$ .

If $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ is a homogeneous progression, then the sequence $g^P$ is equidistributed in $G^P/\Gamma ^P$ by Theorem 5.3, and Proposition 11.5 follows immediately; we want to say something about the closure of $g^P$ in the general case. We fix an integral progression $\vec {P}$ for the rest of this section. For each $1\leqslant i\leqslant s$ , we pick linearly independent integral polynomials $Q_{i,1}, \ldots , Q_{i, t^{\prime }_i}$ that form a basis for $W^{\prime }_i$ . We also let $\{R_1, \ldots , R_r\}$ be a basis for $W^c$ consisting of integral polynomials. Thus,

$$ \begin{align*} {{\vec{P}}\choose{i}} = \sum_{j=1}^{t^{\prime}_i} \vec{v}_{i,j} Q_{i,j} + \sum_{j=1}^r \vec{w}_{i,j} R_j \end{align*} $$

for some vectors $\vec {v}_{i,j}, \vec {w}_{i,j}\in \mathbb {Z}^{t+1}$ , which follows from (34). Consequently,

(48) $$ \begin{align} g^P = g_0 \vec{1} + \sum_{i=1}^s g_i \sum_{j=1}^{t^{\prime}_i} \vec{v}_{i,j} Q_{i,j} + \sum_{j=1}^r \bigg(\sum_{i=1}^s g_i \vec{w}_{i,j}\bigg) R_j. \end{align} $$

We should explain the notation used in (48). For $h\in G$ and $\vec {v}\in \mathbb {R}^{t+1}$ , we interpret $h \vec {v}$ as the element of $(\mathbb {R}^s)^{t+1}$ of the form $(h {v}(0), \ldots , h{v}(t))$ , where $h {v}(i) = (h_1 {v}(i), \ldots , h_s {v}(i))$ is an element of $\mathbb {R}^s$ for each $h=(h_1, \ldots , h_s)\in \mathbb {R}^s$ and $\vec {v} = (v(0), \ldots , v(t))$ . Thus, $h \vec {v}$ is the same as what we previously called $h^{\vec {v}}$ . We use the additive notation $h \vec {v}$ now since we are working in an abelian setting. We also denote $\vec {1} = (1, \ldots , 1)$ .

We let $A_{i,j} = {{\mathrm{Span}}}_{\mathbb {R}}\{ g_i \vec {v}_{i,j} \}$ and $B_{j} = {{\mathrm{Span}}}_{\mathbb {R}}\{\sum _{i=1}^s g_i \vec {w}_{i,j}\}$ , and we denote the closure of their images in $(G/\Gamma )^{t+1}$ by $\overline {A}_{i,j}$ and $\overline {B}_{j}$ , respectively. From the rational independence of $a_i$ and the rationality of the entries of $\vec {v}_{i,j}$ and $\vec {w}_{i,j}$ , we deduce that non-zero entries of $g_i \vec {v}_{i,j}$ and $\sum _{i=1}^s g_i \vec {w}_{i,j}$ are irrational; therefore the sequences $(x,y)\mapsto g_i \vec {v}_{i,j} Q_{i,j}(x,y)$ and $(x,y)\mapsto \sum _{i=1}^s g_i \vec {w}_{i,j} R_j(x,y)$ are equidistributed on $\overline {A}_{i,j}$ and $\overline {B}_{j}$ , respectively. The linear independence of $Q_{i,j}, R_j$ then implies the following proposition.

Proposition 11.6. The closure of $g^P$ is the image of $g_0 \vec {1} + \tilde {G}$ inside $(G/\Gamma )^{t+1}$ , where

$$ \begin{align*}\tilde{G} = \sum_{i=1}^s \sum_{j=1}^{t^{\prime}_i} {A}_{i,j} + \sum_{j=1}^r {B}_j.\end{align*} $$

In particular, the group $\tilde {G}$ contains

$$ \begin{align*} K = \sum_{i=1}^s \sum_{j=1}^{t^{\prime}_i} {A}_{i,j} = {{\mathrm{Span}}}_{\mathbb{R}}\{h_i \vec{v}_{i,j}: h_i\in G_i, 1\leqslant i\leqslant s, 1\leqslant j\leqslant t^{\prime}_i\}. \end{align*} $$

We observe that $K = \tilde {G} = G^P$ whenever $\vec {P}$ is homogeneous.

Corollary 11.7. Fix $0\leqslant i\leqslant t$ and let $\mathcal {A}_i(\vec {P}) < s$ . For $k\leqslant s$ , we have $\{0\}^{i}\times G_{k}\times \{0\}^{t-i}\leqslant K$ if and only if $k> \mathcal {A}_i(\vec {P})$ .

Proof. For each $1\leqslant k\leqslant s$ , we let $\mathcal {P}^{\prime }_k = {{\mathrm{Span}}}_{\mathbb {R}}\{\vec {v}_{k,1}, \ldots , \vec {v}_{k, t^{\prime }_k}\}$ . Thus

$$ \begin{align*} K = {{\mathrm{Span}}}_{\mathbb{R}}\{h_k \vec{u}_k:\; 1\leqslant k \leqslant s,\; h_k\in G_k,\; \vec{u}_k\in\mathcal{P}^{\prime}_k\}, \end{align*} $$

and so for $k\leqslant s$ we have the inclusion $\{0\}^{i}\times G_{k}\times \{0\}^{t-i}\leqslant K$ if and only if the vector $\vec {e}_i$ with 1 in the ith position and 0 elsewhere is contained in $\mathcal {P}^{\prime }_k$ . The statement $\vec {e}_i\in \mathcal {P}^{\prime }_k$ is equivalent to the inclusion ${{x+P_i(y)}\choose {k}}\in W^{\prime }_k$ . This is in turn equivalent to the statement that there are no algebraic relations of the form (8) with $\deg Q_i = k$ , which is precisely the condition that $k> \mathcal {A}_i(\vec {P})$ .

Corollary 11.8. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression. Then $\mathcal {W}_i(\vec {P})\leqslant \mathcal {A}_i(\vec {P})$ for each $0\leqslant i\leqslant t$ .

We finish this section by showing the converse.

Proposition 11.9. Let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral polynomial progression for which $\mathcal {A}_i(\vec {P}) = s$ for some $0\leqslant i\leqslant t$ . Then for any standard Weyl system $(X,\mathcal {X},\mu ,T)$ of order s there exist smooth functions $f_0, \ldots , f_t: X\to \mathbb {C}$ such that $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_{s-1}) = 0$ but the expression (35) is $1$ . In particular, $\mathcal {W}_i(\vec {P})\geqslant s$ .

Before we prove Proposition 11.9, we define $\partial Q(x) = Q(x+1) - Q(x)$ for $Q\in \mathbb {R}[x]$ . From the identity $\partial {{x}\choose {k}} = {{x+1}\choose {k}} - {{x}\choose {k}} = {{x}\choose {k-1}}$ we deduce that

$$ \begin{align*} \partial \bigg(a_0 + a_1 {{x}\choose{1}} + \cdots + a_d {{x}\choose{d}}\bigg) = a_1 + a_2 {{x}\choose{1}} + \cdots + a_d {{x}\choose{d-1}}. \end{align*} $$

Proof of Proposition 11.9

Let T be as in (47) for some irrational $a_0$ . From $\mathcal {A}_i(\vec {P}) = s$ it follows that $\vec {P}$ satisfies an algebraic relation (8) with $\deg Q_i = s$ . For each $0\leqslant k\leqslant t$ , we let $Q_k(u) = b_{k,1} u + \cdots + b_{k,s}{{u}\choose {s}}$ . We define $\xi (u) = e(\alpha u)$ for some irrational $\alpha $ , and we let

$$ \begin{align*}f_k(a_1, \ldots, a_s) = \xi(b_{k,1} a_1 + \cdots + b_{k,s} a_s).\end{align*} $$

Thus, we have

$$ \begin{align*} f_k(T^{x+P_k(y)}a) &= \xi(a_0 Q_k(x+P_k(y)) + a_1 \partial Q_k(x+P_k(y)) \\ &\quad + \cdots + a_s \partial^s Q_k(x+P_k(y))), \end{align*} $$

and so

$$ \begin{align*} \prod_{i=0}^t f_i(T^{x+P_i(y)} a) = \xi \bigg(\sum_{j=0}^s a_j \partial^j \sum_{k=0}^t Q_k(x+P_k(y)) \bigg) = 1. \end{align*} $$

On the other hand, we have

$$ \begin{align*}|\operatorname{\mathrm{\mathbb{E}}}(f_i|\mathcal{Z}_{s-1})(a_1, \ldots, a_s)| = \bigg|\int_{\mathbb{T}} f_i(a_1, \ldots, a_s) \,da_s \bigg| = \bigg|\int_{\mathbb{T}} \xi(b_{i,s} a_s) \,da_s \bigg| = 0\end{align*} $$

for almost every $a_s$ .

12 The proof of Theorem 1.16

We conclude this paper with the proof of Theorem 1.16. Throughout this section, we let $t\in \mathbb {N}_+$ and $\vec {P}\in \mathbb {R}[x,y]^{t+1}$ be an integral progression of algebraic complexity at most $1$ . We also let $Q_1, \ldots , Q_k$ be integral polynomials as in the statement of Theorem 1.16. Thus, $P_i = \sum _j a_{ij} Q_j$ and $Q_i = \sum _j a^{\prime }_{ij} P_j$ for $a_{ij}, a^{\prime }_{ij}\in \mathbb {Z}$ . The second part of the theorem follows from the first part and the Furstenberg correspondence principle. We therefore proceed to prove part (i), followed by part (iii). Our argument for part (i) follows closely the proof of [Reference FrantzikinakisFra08, Theorem C].

Proof of Theorem 1.16(i)

We first prove part (i) of Theorem 1.16 in the totally ergodic case. Suppose that $(X, \mathcal {X}, \mu , T)$ is a totally ergodic system with the Kronecker factor $(Z_1, \mathcal {Z}_1, \nu , S)$ . The space $Z_1$ can be assumed to be a connected compact abelian group with an ergodic translation $Sx = x+b$ . For each $\delta>0$ , let $B_\delta $ be the $\delta $ -neighbourhood of the identity in $Z_1$ , and let

$$ \begin{align*} \tilde{B}_\delta = \{n\in\mathbb{N}: Q_1(n) b, \ldots, Q_{k}(n) b\in B_\delta\}. \end{align*} $$

It follows from the ergodicity of S and linear independence of $Q_1, \ldots , Q_k$ that

$$ \begin{align*}\lim_{N-M\to\infty}\frac{|\tilde{B}_\delta\cap [M,N)|}{N-M} = \nu(B_\delta)^k>0\end{align*} $$

for any $\delta>0$ . In particular, $\tilde {B}_\delta $ is syndetic for any $\delta>0$ , otherwise we would have $\liminf _{N-M\to \infty }({|\tilde {B}_\delta \cap [M,N)|}/({N-M})) = 0$ .

We aim to show that for any $A\in \mathcal {X}$ with $\mu (A)>0$ and any $\epsilon>0$ , we have

(49) $$ \begin{align} \lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in\tilde{B}_\delta\cap [M,N)} \mu(A\cap T^{P_1(n)}A \cap \cdots \cap T^{P_t(n)}A) \geqslant \mu(A)^{t+1}-\epsilon \end{align} $$

for all sufficiently small $\delta> 0$ . This implies part (i) of Theorem 1.16 as follows: if there is a sequence $K_N$ of intervals in $\mathbb {N}$ of length converging to infinity, with the property that

(50) $$ \begin{align} \mu(A\cap T^{P_1(n)}A \cap \cdots \cap T^{P_t(n)}A) < \mu(A)^{t+1}-\epsilon \end{align} $$

for all $n\in \bigcup _{N\in \mathbb {N}}K_N$ , then the sets $\tilde {K}_N = K_N\cap \tilde {B}_\delta $ are non-empty for all sufficiently large N due to the syndecticity of $B_\delta $ (in fact, their cardinalities also converge to infinity). Since (50) holds for all $n\in \bigcup _{N\in \mathbb {N}}\tilde {K}_N$ , inequality (49) fails, leading to a contradiction.

We first show that if $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_1) = 0$ , then

(51) $$ \begin{align} \lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in[M,N)} 1_{\tilde{B}_\delta}(n) \prod_{i=1}^t T^{P_i(n)}f_i = 0 \end{align} $$

in $L^2$ for any $f_1, \ldots , f_t\in L^\infty (\mu )$ . From the measurability of $B_\delta $ it follows that we can approximate $1_{\tilde {B}_\delta }(n) = \prod _{i=1}^k 1_{{B}_\delta }(Q_i(n)b)$ arbitrarily well by linear combinations of $\prod _{i=1}^k \xi _i(Q_i(n)b)$ for some characters $\xi _1, \ldots , \xi _k$ on $Z_1$ . Using the fact that each $Q_i$ is an integral linear combination of $P_1, \ldots , P_t$ , we can rewrite $\prod _{i=1}^k \xi _i(Q_i(n)b) = \prod _{i=1}^t \tilde {\xi }_i(P_i(n)b)$ for some characters $\tilde {\xi }_1, \ldots , \tilde {\xi }_t$ .

In effect, it suffices to show that

(52) $$ \begin{align} \lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in[M,N)} \prod_{i=1}^t \tilde{\xi}_i(P_i(n)b)\prod_{i=1}^t T^{P_i(n)}f_i = 0. \end{align} $$

We can rephrase the limit in (52) as

(53) $$ \begin{align} \lim_{N-M\to\infty} \prod_{i=1}^t \tilde{\xi}_i(-y) \mathop{\mathbb{E}}\limits_{n\in[M,N)} \prod_{i=1}^t R^{P_i(n)}(f_i(x)\tilde{\xi}_i(y)), \end{align} $$

where $R = T\times S$ . Let $(R_t)_t$ be the ergodic components of R and $(f_i \otimes \xi _i) (x,y) = f_i(x)\xi _i(y)$ ; then $\operatorname {\mathrm {\mathbb {E}}}(f_i \otimes \xi _i\mid \mathcal {Z}_1(R_t)) = 0$ whenever $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_1(T)) = 0$ for almost every t. It thus follows from Corollary 1.13 that if $\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_1) = 0$ for some i, then the limit in (53) is 0, which proves the claim.

We therefore deduce that

(54) $$ \begin{align} &\lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in\tilde{B}_\delta\cap[M,N)}\int_X \prod_{i=0}^t T^{P_i(n)}1_A\, d\mu\nonumber\\ &\qquad= \lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in\tilde{B}_\delta\cap[M,N)} \int_{Z_1} \prod_{i=0}^t S^{P_i(n)}\tilde{1}_A \,d\nu\nonumber\\ &\qquad= \lim_{N-M\to\infty}\mathop{\mathbb{E}}\limits_{n\in\tilde{B}_\delta\cap[M,N)} \int_{Z_1} \prod_{i=0}^t S^{\sum_j a_{ij} Q_j(n)}\tilde{1}_A \,d\nu, \end{align} $$

where $\tilde {1}_A = \operatorname {\mathrm {\mathbb {E}}}(1_A\mid \mathcal {Z}_1)$ . Due to the ergodicity of S and the linear independence of $Q_1, \ldots , Q_k$ , the limit in (54) equals

(55) $$ \begin{align} \frac{1}{\nu({B}_\delta)^k}\int_{B_\delta^k}\int_{Z_1} \prod_{i=0}^t \tilde{1}_A\left(x+\sum_j a_{ij} y_j\right) \,d\nu(x) \,d\nu^k(y). \end{align} $$

In the limit $\delta \to 0$ , the expression in (55) converges to $\int _{Z_1}(\tilde {1}_A)^{t+1}$ ; hence for every $\epsilon>0$ and sufficiently small $\delta>0$ , we have

(56) $$ \begin{align} \frac{1}{\nu({B}_\delta)^k}\int_{B_\delta^k}\int_{Z_1} \prod_{i=0}^t \tilde{1}_A\left(x+\sum_j a_{ij} y_j\right) \,d\nu(x) \,d\nu^k(y)\geqslant \int_{Z_1}(\tilde{1}_A)^{t+1} - \epsilon. \end{align} $$

Using the Hölder inequality, we obtain that $\int _{Z_1}(\tilde {1}_A)^{t+1}\geqslant (\int _{Z_1}\tilde {1}_A)^{t+1}=\mu (A)^{t+1}$ , which implies (49). This finishes the totally ergodic case; the derivation of the ergodic case from the totally ergodic case proceeds in the same way as in the proof of [Reference FrantzikinakisFra08, Theorem C].

We now proceed to the proof of part (iii) of Theorem 1.16. The argument can be seen as a finitary version of the argument above, with all the necessary modifications coming from working in the finitary setting. It follows the proof of the three-term arithmetic progression case in [Reference Green, Tao, Bárány, Solymosi and SágiGT10, Theorem 1.12].

Proof of Theorem 1.16(iii)

Let $\alpha , \epsilon> 0$ , and suppose that $A\subset \mathbb {Z}/N\mathbb {Z}$ has size $|A|\geqslant \alpha N$ for a prime $N>N_0(\alpha ,\epsilon )$ . Let $\mathcal {F}:\mathbb {R}_+\to \mathbb {R}_+$ be a growth function to be specified later. By [Reference Candela and SisaskCS12, Theorem 5.1], the irrational and periodic version of the celebrated arithmetic regularity lemma of Green and Tao [Reference Green, Tao, Bárány, Solymosi and SágiGT10, Theorem 1.2], there exist a positive number $M = O_{\epsilon , \mathcal {F}}(1)$ and a decomposition

(57) $$ \begin{align} 1_A = f_{\mathrm{nil}} + f_{\mathrm{sml}} + f_{\mathrm{unf}} \end{align} $$

into $1$ -bounded functions such that:

  1. (i) $f_{\mathrm {nil}} = F(g(n)\Gamma )$ is an $\mathcal {F}(M)$ -irrational, N-periodic nilsequence of degree 1 and complexity M;

  2. (ii) $\Vert f_{\mathrm {sml}}\Vert _1\leqslant \epsilon $ ;

  3. (iii) $\Vert f_{\mathrm {unf}}\Vert _{U^2}\leqslant {1}/{\mathcal {F}(M)}$ .

Moreover, $f_{\mathrm {nil}}$ takes values in $[0,1]$ . Unpacking the definition of $f_{\mathrm {nil}}$ , we see that $F:(\mathbb {R}/\mathbb {Z})^m\to [0,1]$ is M-Lipschitz, $1\leqslant m\leqslant M$ , and $g(n) = b n$ for some $\mathcal {F}(M)$ -irrational element $b\in (({1}/{N})\mathbb {Z}/\mathbb {Z})^m$ .

Our strategy is as follows. We shall define a weight $\tilde {\mu }:\mathbb {Z}/N\mathbb {Z}\to \mathbb {R}_{\geqslant 0}$ which satisfies

(58) $$ \begin{align} \mathop{\mathbb{E}}\limits_{y\in \mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y) = 1 + O(\epsilon) \end{align} $$

and

(59) $$ \begin{align} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y)\prod_{i=0}^t 1_A(x + P_i (y)) \geqslant \alpha^{t+1} - O(\epsilon). \end{align} $$

Using the pigeonhole principle and (58), it can be deduced from (59) that for $\Omega _{\alpha ,\epsilon }(N)$ values of y, we have

$$ \begin{align*} \mathop{\mathbb{E}}\limits_{x\in\mathbb{Z}/N\mathbb{Z}}\prod_{i=0}^t 1_A(x + P_i (y)) \geqslant \alpha^{t+1} - O(\epsilon), \end{align*} $$

which proves part (iii) of Theorem 1.16.

We shall prove (59) by splitting each $1_A$ using (57) and showing that terms involving $f_{\mathrm {sml}}$ or $f_{\mathrm {unf}}$ have contributions at most $O(\epsilon )$ , while the term

(60) $$ \begin{align} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y)\prod_{i=0}^t f_{\mathrm{nil}}(x + P_i (y)) \end{align} $$

has size at least $\alpha ^{t+1}-O(\epsilon )$ . Showing that the terms involving $f_{\mathrm {sml}}$ or $f_{\mathrm {unf}}$ make negligible contributions to (59) is akin to showing (51) for all functions with ${\operatorname {\mathrm {\mathbb {E}}}(f_i\mid \mathcal {Z}_1) = 0}$ in the proof of part (i) of Theorem 1.16. In doing so, we shall use the idea that while we fix $\epsilon>0$ , we have control over how fast we choose $\mathcal {F}$ to grow; and we choose it to grow fast enough depending on $\alpha $ and $\epsilon $ to ensure that all the estimates work.

Let $\delta>0$ be fixed later. We define $\psi :(\mathbb {R}/\mathbb {Z})^m\to \mathbb {R}_+$ to be a non-negative, $1$ -bounded, $O_M(\delta ^{-1})$ -Lipschitz function that is 1 on $[-\tfrac 14\delta , \tfrac 14\delta ]^m$ and 0 outside $[-\tfrac 12\delta , \tfrac 12\delta ]^m$ . We let $c = \int _{(\mathbb {R}/\mathbb {Z})^m}\psi $ ; thus $(\tfrac 12\delta )^m\leqslant c\leqslant \delta ^m$ . We then let $\mu (y) = {\psi (b y)}/{c}$ . Since b can be picked without the loss of generality from $[0,1]^m$ , the function $\mu $ is $O_M(\delta ^{-M-1})$ -Lipschitz.

We let $\tilde {\mu }(y) = \mu (Q_1(y))\ldots \mu (Q_{k}(y))$ . It is a weight that picks out all the values y for which $Q_1(y) b$ , …, $Q_k(y) b$ are close to being an integer, and it plays a similar role to the function $1_{\tilde {B}_\delta }$ in the proof of part (i) of Theorem 1.16, except that it is constructed using a Lipschitz function rather than an indicator function. To show (58), we observe that

(61) $$ \begin{align} \mathop{\mathbb{E}}\limits_{y\in \mathbb{Z}/N\mathbb{Z}}\tilde{\mu}(y) = \frac{1}{c^{k}}\mathop{\mathbb{E}}\limits_{y\in[N]}\prod_{i=1}^{k}\psi(b Q_i(y)). \end{align} $$

Using the $\mathcal {F}(M)$ -irrationality of g, linear independence of $Q_1$ , …, $Q_{k}$ as well as Theorem 2.5, we deduce that (61) equals

$$ \begin{align*} \frac{1}{c^{k}}\bigg(\bigg(\int \psi\bigg)^{k} + O_M(\delta^{-1} \mathcal{F}(M)^{-c_M})\bigg) = 1 + O_M(\delta^{-M-2} \mathcal{F}(M)^{-c_M}) \end{align*} $$

for some $c_M>0$ . The estimate (58) follows from choosing $\mathcal {F}$ growing fast enough depending on $\delta $ and picking $\delta = c^{\prime }_M \epsilon $ for an appropriately chosen $c^{\prime }_M>0$ .

We decompose each $1_A$ in (59) using (57) and split (59) into $3^t$ terms accordingly using multilinearity. We first estimate (60), and subsequently we bound the contributions of $f_{\mathrm {sml}}$ and $f_{\mathrm {unf}}$ .

Taking $\mathcal {F}$ growing fast enough, we assume that $\Vert f_{\mathrm {unf}}\Vert _{U^2}\leqslant \epsilon $ , and thus $|{\mathbb{E}}_{x\in \mathbb {Z}/N\mathbb {Z}} f_{\mathrm {unf}} (x)|=\Vert f_{\mathrm {unf}}\Vert _{U^1}\leqslant \Vert f_{\mathrm {unf}}\Vert _{U^2}\leqslant \epsilon $ . From the Hölder inequality and the bound on the $L^1$ norm of $f_{\mathrm {sml}}$ , we obtain a bound $|{\mathbb{E}}_{x\in \mathbb {Z}/N\mathbb {Z}} f_{\mathrm {sml}}|\leqslant \epsilon $ . From these bounds and (57) we deduce that ${\mathbb{E}}_{x\in \mathbb {Z}/N\mathbb {Z}} f_{\mathrm {nil}}(x) \geqslant \alpha - 2\epsilon $ .

We observe that by M-Lipschitzness of F and the definitions of $\mu $ , $\tilde {\mu }$ and $Q_j$ , we have $f(x+P_i(y)) = f(x + \sum _j a_{ij} Q_j (y)) = f(x) + O_M(\delta ) = f(x) + O(\epsilon )$ whenever $\tilde {\mu }(y)>0$ . It follows from this that

(62) $$ \begin{align} \mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y)\prod_{i=0}^t f(x + \sum_j a_{ij} Q_j (y)) = \bigg(\mathop{\mathbb{E}}\limits_{x\in \mathbb{Z}/N\mathbb{Z}} f(x)^{t+1} + O(\epsilon)\bigg) \mathop{\mathbb{E}}\limits_{y\in \mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y). \end{align} $$

Using the estimate for (58) and the Hölder inequality, we deduce that (62) is bounded from below by

$$ \begin{align*} \bigg(\mathop{\mathbb{E}}\limits_{x\in \mathbb{Z}/N\mathbb{Z}} f_{\mathrm{nil}}(x)\bigg)^{t+1} - O(\epsilon)\geqslant \alpha^{t+1} - O(\epsilon), \end{align*} $$

where the last inequality follows from the Hölder inequality.

We now bound terms involving $f_{\mathrm {sml}}$ . Suppose without loss of generality that $f_{\mathrm {sml}}$ is in the $i=0$ position, and let ${f}_1, \ldots , {f}_t\in \{f_{\mathrm {nil}}, f_{\mathrm {sml}}, f_{\mathrm {unf}}\}$ . Then

(63) $$ \begin{align} \bigg|\mathop{\mathbb{E}}\limits_{x,y\in\mathbb{Z}/N\mathbb{Z}} \tilde{\mu}(y) f_{\mathrm{sml}}(x)\prod_{i=1}^t {f}_i(x + P_i (y))\bigg|\leqslant \Vert f_{\mathrm{sml}}\Vert_1 \mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\tilde{\mu}(y)\leqslant\epsilon, \end{align} $$

where the first inequality follows from the Hölder inequality, positivity of $\tilde {\mu }$ and $1$ -boundedness of ${f}_1, \ldots , {f}_t$ .

It remains to bound the contributions of $f_{\mathrm {unf}}$ . Using a standard argument (see, for example, the proof of [Reference Green and TaoGT12, Proposition 3.1]), we want to approximate $f_{\mathrm {unf}}$ by a trigonometric polynomial, which allows us to essentially replace $f_{\mathrm {unf}}$ by additive characters. Let $K\in \mathbb {N}_+$ be fixed later. Since $\mu $ is an $O_M(\epsilon ^{-M})$ -Lipschitz function, there exists a trigonometric polynomial $\mu _1:\mathbb {Z}/N\mathbb {Z}\to \mathbb {C}$ such that $\|\mu -\mu _1\|_\infty \ll _M \epsilon ^{-C^{(1)}_M} K^{-c}$ for some $0<c, C^{(1)}_M$ . Moreover, $\mu _1$ has degree at most $K^M$ and its coefficients satisfy $\|\widehat {\mu _1}\|_\infty \leqslant \|\mu \|_\infty \ll _M \epsilon ^{-M}$ .

Let $f_0, \ldots , f_t\in \{f_{\mathrm {nil}}, f_{\mathrm {sml}}, f_{\mathrm {unf}}\}$ , with at least one of them being $f_{\mathrm {unf}}$ . We then bound

(64) $$ \begin{align} \bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\tilde{\mu}(y)\prod_{i=0}^t f_i(x+P_i(y))\bigg|&= \bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\prod_{i=1}^k\mu(Q_i(y))\prod_{i=0}^t f_i(x+P_i(y))\bigg|\nonumber\\[3pt] &\leqslant k \max(\Vert \mu\Vert_\infty, \Vert \mu_1\Vert_\infty)^{k-1}\Vert \mu-\mu\Vert_\infty\nonumber\\[3pt] &\quad+\bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\prod_{i=1}^k\mu_1(Q_i(y))\prod_{i=0}^t f_i(x+P_i(y))\bigg|. \end{align} $$

The first term has size at most $C^{(2)}_M \epsilon ^{-C^{(2)}_M} K^{-c}$ for some $C^{(2)}_M>0$ . The second term is bounded by

(65) $$ \begin{align} K^M \Vert \widehat{\mu_1}\Vert_\infty \bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\prod_{i=1}^k\xi_i(Q_i(y))\prod_{i=1}^t f_i(x+P_i(y))\bigg| \end{align} $$

for some characters $\xi _i$ on $\mathbb {Z}/N\mathbb {Z}$ . Since each $Q_i$ is an integral linear combination of the $P_i$ , we can rewrite $\prod _{i=1}^k\xi _i(Q_i(y)) = \prod _{i=1}^t \tilde {\xi }_i(x+P_i(y))$ . We let $\tilde {f}_i = f_i \tilde {\xi }_i$ . Since each $\tilde {\xi }_i$ is a linear character, we have $\Vert f_i\Vert _{U^2} = \|\tilde {f}_i\|_{U^2}$ for each i.

We recall from Theorem 1.11 that $\vec {P}$ has true complexity 1. Combining this fact with (64), (65) and the bound $\|\tilde {f}_i\|_{U^2}\leqslant 1/\mathcal {F}(M)$ for some i, we deduce that there is some decreasing function $\omega :\mathbb {R}_+\to \mathbb {R}_+$ , depending only on $\vec {P}$ , such that

(66) $$ \begin{align} \bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\tilde{\mu}(y)\prod_{i=0}^t f_i(x+P_i(y))\bigg|&\leqslant C^{(2)}_M \epsilon^{-C^{(2)}_M} K^{-c} + C^{(2)}_M \epsilon^{-M} K^M \omega(1/\mathcal{F}(M)), \end{align} $$

increasing the constant $C^{(2)}_M$ if necessary. We note that the existence of $\omega $ is equivalent to the statement that $\vec {P}$ is controlled by $U^2$ at i. We now show that we can choose K large enough and $\mathcal {F}$ growing fast enough so that the right-hand side of (66) is bounded by $O(\epsilon )$ .

For any given M, we find a constant $C^{(3)}_M$ such that $(C^{(3)}_M)^c \geqslant C^{(2)}_M$ and ${c C^{(3)}_M - C^{(2)}_M \geqslant 1}$ . We then let $K_M = C^{(3)}_m \epsilon ^{-C^{(3)}_M}$ , so that

$$ \begin{align*} C^{(2)}_M \epsilon^{-C^{(2)}_M} K_M^{-c} = C^{(2)}_M {C^{(3)}_M}^{-c} \epsilon^{c {{C^{(3)}_M} {-C^{(2)}_M}}}\leqslant \epsilon. \end{align*} $$

Picking $\mathcal {F}$ growing sufficiently fast depending on $\epsilon $ , we can ensure that $C^{(2)}_M \epsilon ^{-M} K_M^M \omega (1/\mathcal {F}(M))\leqslant \epsilon $ . We thus set $K = K_M$ for the value of M induced by $\epsilon $ and $\mathcal {F}$ , and so

$$ \begin{align*} \bigg|\mathop{\mathbb{E}}\limits_{y\in\mathbb{Z}/N\mathbb{Z}}\tilde{\mu}(y)\prod_{i=0}^t f_i(x+P_i(y))\bigg|\leqslant 2\epsilon.\\[-3.8pc] \end{align*} $$

Acknowledgments

We are indebted to Donald Robertson for his comments on earlier versions of the paper and fruitful conversations on the project while it was being carried out, and to Faustin Adiceam and Julia Wolf for pointing out an error in the original statement and proof of Corollary 1.13. We would also like to thank Sean Prendiville for introducing us to the topic of complexity, Tuomas Sahlsten for hosting a reading group on the dynamical proof of Szemerédi theorem, and Jonathan Chapman for useful discussions on algebraic relations between terms of polynomial progressions. Finally, we would like to thank the anonymous referee for suggestions on how to improve the paper.

References

Altman, D.. On a conjecture of Gowers and Wolf. Preprint, 2021, arXiv:2106.15437.Google Scholar
Bergelson, V., Host, B. and Kra, B.. Multiple recurrence and nilsequences. Invent. Math. 160 (2005), 261303. With an appendix by I. Ruzsa.CrossRefGoogle Scholar
Bergelson, V. and Leibman, A.. Polynomial extensions of van der Waerden’s and Szemerédi’s theorems. J. Amer. Math. Soc. 9 (1996), 725753.CrossRefGoogle Scholar
Bergelson, V., Leibman, A. and Lesigne, E.. Complexities of finite families of polynomials, Weyl systems, and constructions in combinatorial number theory. J. Anal. Math. 103 (2007), 4792.CrossRefGoogle Scholar
Candela, P. and Sisask, O.. Convergence results for systems of linear forms on cyclic groups and periodic nilsequences. SIAM J. Discrete Math. 28 (2012), 786810.CrossRefGoogle Scholar
Frantzikinakis, N. and Kra, B.. Polynomial averages converge to the product of integrals. Israel J. Math. 148(1) (2005), 267276.CrossRefGoogle Scholar
Frantzikinakis, N. and Kra, B.. Ergodic averages for independent polynomials and applications. J. Lond. Math. Soc. 74 (2006), 131142.CrossRefGoogle Scholar
Frantzikinakis, N.. Multiple ergodic averages for three polynomials and applications. Trans. Amer. Math. Soc. 360(10) (2008), 54355475.CrossRefGoogle Scholar
Frantzikinakis, N.. Some open problems on multiple ergodic averages. Bull. Hellenic Math. Soc. 60 (2016), 4190.Google Scholar
Green, B. and Tao, T.. An arithmetic regularity lemma, an associated counting lemma, and applications. An Irregular Mind (Bolyai Society Mathematical Studies, 21 ). Eds. Bárány, I., Solymosi, J. and Sági, G.. Springer, Berlin, 2010, pp. 261334.CrossRefGoogle Scholar
Green, B. and Tao, T.. The quantitative behaviour of polynomial orbits on nilmanifolds. Ann. of Math. 175 (2012), 465540.CrossRefGoogle Scholar
Gowers, W. T. and Wolf, J.. The true complexity of a system of linear equations. Proc. Lond. Math. Soc. 100(1) (2010), 155176.CrossRefGoogle Scholar
Gowers, W. T. and Wolf, J.. Linear forms and higher-degree uniformity for functions on ${F}_p^n$ . Geom. Funct. Anal. 21 (2011), 3669.CrossRefGoogle Scholar
Gowers, W. T. and Wolf, J.. Linear forms and quadratic uniformity for functions on ${F}_p^n$ . Mathematika 57 (2011), 215237.CrossRefGoogle Scholar
Gowers, W. T. and Wolf, J.. Linear forms and quadratic uniformity for functions on ${\mathbb{Z}}_N$ . J. Anal. Math. 115(1) (2011), 121186.CrossRefGoogle Scholar
Host, B. and Kra, B.. Convergence of polynomial ergodic averages. Israel J. Math. 149(1) (2005), 119.CrossRefGoogle Scholar
Host, B. and Kra, B.. Nonconventional ergodic averages and nilmanifolds. Ann. of Math. 161(1) (2005), 397488.CrossRefGoogle Scholar
Host, B. and Kra, B.. Nilpotent Structures in Ergodic Theory. American Mathematical Society, Providence, RI, 2018.CrossRefGoogle Scholar
Kuca, B.. Further bounds in the polynomial Szemerédi theorem over finite fields. Acta Arith. 198 (2021), 77108.CrossRefGoogle Scholar
Kuca, B.. True complexity of polynomial progressions in finite fields. Proc. Edinb. Math. Soc. 64 (2021), 153.CrossRefGoogle Scholar
Leibman, A.. Convergence of multiple ergodic averages along polynomials of several variables. Israel J. Math. 146 (2005), 303315.CrossRefGoogle Scholar
Leibman, A.. Pointwise convergence of ergodic averages for polynomial sequences of translations on a nilmanifold. Ergod. Th. & Dynam. Sys. 25(1) (2005), 201213.CrossRefGoogle Scholar
Leibman, A.. Orbit of the diagonal in the power of a nilmanifold. Trans. Amer. Math. Soc. 362 (3) (2009), 16191658.CrossRefGoogle Scholar
Manners, F.. Good bounds in certain systems of true complexity 1. Discrete Anal. 21 (2018), 40 pp.Google Scholar
Manners, F.. True complexity and iterated Cauchy–Schwarz. Preprint, 2021, arXiv:2109.05731.Google Scholar
Peluse, S.. On the polynomial Szemerédi theorem in finite fields. Duke Math. J. 168 (5) (2019), 749774.CrossRefGoogle Scholar
Tao, T.. A correction to ‘An arithmetic regularity lemma, an associated counting lemma, and applications’, 2020. Available at https://terrytao.wordpress.com/2020/11/26/a-correction-to-an-arithmetic-regularity-lemma-an-associated-counting-lemma-and-applications/.Google Scholar