1. Introduction
Let $h(\mathbf{Y})=-\int_{\Bbb{R}^{m}}{f_{\mathbf{Y}}(\mathbf{y};\ t)\log f_{\mathbf{Y}}(\mathbf{y};\ t)\,\mathrm{d}\mathbf{y}}$ denote the differential entropy of a random vector $\mathbf{Y}$ with probability density function (PDF) $f_{\mathbf{Y}}(\mathbf{y};\ t)$ depending on a real parameter t. The entropy power of an m-variate random vector $\mathbf{Y}$ is defined by
which was first introduced by Shannon [Reference Shannon13]. One of the most important inequalities in information theory is the entropy power inequality (EPI), which gives a lower bound for the differential entropy of the sum of the independent random vectors $\mathbf{X}$ and $\mathbf{Y}$ as $N(\mathbf{X}+\mathbf{Y})\geq N(\mathbf{X})+N(\mathbf{Y})$ . The first complete proof of the EPI was given in [Reference Stam15]; in its development, [Reference Stam15] proved an equality called de Bruijn’s identity. This identity links Fisher information with Shannon’s differential entropy (see [Reference Blachman5]). Consider the additive Gaussian noise channel model
in which the input signal $\mathbf{X}=(X_{1},\ldots,X_{m})^\top$ and the additive noise $\mathbf{W}_{t}=(W_{t,1},\ldots,W_{t,m})^\top$ are two m-variate random vectors and $\mathbf{W}_{t}$ is normally distributed with mean vector $\mathbf{0}$ and covariance matrix
where the $\sigma_{ij}$ , $i,j=1,2,\ldots ,m$ , are real numbers. De Bruijn’s identity, generalized by Costa [Reference Costa7] to multivariate random variables, is given by
in which $\mathbf{X}$ and $\mathbf{W}_{t}$ are independent random vectors and $J(\mathbf{Y})$ stands for the Fisher information of $f_{\mathbf{Y}}(\mathbf{y};\ t)$ , defined by
There are several applications of the EPI, such as in bounding the capacity of certain kinds of channels and proving converses of channel or source coding theorems; see, e.g., [Reference Bergmans6, Reference Weingarten, Steinberg and Shamai18]. Considering the channel model (1), [Reference Costa7] presented an extension of the EPI for the case in which $\mathbf{W}_{t}$ is independent of $\mathbf{X}$ with $\Sigma_{\mathbf{W}_{t}}=t\mathbf{I}_{m}$ , where $\mathbf{I}_{m}$ is the $m\times m$ identity matrix. That is,
or, equivalently, $N(\mathbf{X}+\mathbf{W}_{t})$ is concave in t, i.e.
Later, [Reference Dembo8] provided another simple proof for the Costa’s concavity inequality (5) via the Stam Fisher information inequality [Reference Stam15] defined by
where X and W are independent random variables. Also, [Reference Villani17] used some advanced methods to simplify Costa’s proof of the inequality (5).
As mentioned before, in all of the above results the assumption of independence between the input signal $\mathbf{X}$ and the additive noise $\mathbf{W}_{t}$ has been required. However, there are several real situations, such as in radar and sonar systems, in which the noise is highly dependent on the transmitted signal [Reference Kay11]. It was illustrated in [Reference Takano, Watanabe, Fukushima, Prohorov and Shiryaev16] that, under some assumptions, Shannon’s EPI can hold for weakly dependent random variables; [Reference Asgari and Alamatsaz3] extended the EPI to dependent random variables with arbitrary distibutions; and [Reference Johnson10] provided certain conditions under which the conditional EPI can hold for dependent summands as well.
One of the best methods for describing the dependency structure among random variables is by copula functions. Copula theory was first introduced in [Reference Sklar14] in order to achieve the connection between a joint PDF and its marginals. In [Reference Asgari, Alamatsaz and Khoolenjani4], the authors extended two inequalities based on the Fisher information when the input signal and noise components are dependent and their dependence structure is modeled by several well-known copulas. There are several families of copulas with different dependence structures. The Gaussian copula is one of the most usable, and describes different levels of dependence between marginal components. In the present paper, by considering the additive Gaussian noise channel model (1) where the input signal $\mathbf{X}$ and noise $\mathbf{W}_{t}$ are dependent random vectors obeying the multivariate Gaussian copula, first, an extension of de Bruijn’s identity (3) is derived, and then Costa’s concavity inequality (5) is proved, under some mild conditions.
The rest of the paper is organized as follows. In Section 2 we recall the copula theory concept and the basic definition of the multivariate Gaussian copula function, along with one of its particular cases. In Section 3 we provide a generalization of the first-order derivatives of the differential entropy and Fisher information, provided that the input signal and noise components are dependent variables. Thus, based on these derivatives, Costa’s concavity inequality for the case that the random vector $ \mathbf{X}$ is composed of independent coordinates is extended. Finally, we illustrate the one-dimensional versions of our results in Section 4.
Let us first establish the fundamental definitions and notation used in this paper. Let $\phi(\mathbf{y})$ and $\psi(\mathbf{y})$ be twice continuously differentiable functions on $\Bbb{R}^{m}$ , and V be any closed and simply connected m-dimensional region in $\Bbb{R}^{m}$ bounded by a piecewise smooth, closed, and oriented surface S. We recall Green’s identity [Reference Amazigo and Rubenfeld1], which is stated as
in which $\nabla\phi$ and $\nabla\psi$ are the gradients of $\phi$ and $\psi$ , respectively, $\mathbf{n}_{S}$ denotes the unit vector normal to the surface S, and $\nabla\psi .\mathbf{n}_{S}$ is the inner product of the two vectors. Now, the m-dimensional Stokes’ theorem is recalled: it states that if $\mathbf{F}\colon\Bbb{R}^{m}\rightarrow\Bbb{R}^{m}$ is a vector field over $\Bbb{R}^{m}$ , then
where $\partial V=S$ is the boundary of V.
We denote the PDF and cumulative distribution function (CDF) of a random variable X by $f_{X}(x)$ and $F_{X}(x)$ , respectively.
2. Copula background
Copula theory is popular in multivariate distribution analysis as copulas allow easy modeling of the distribution of a random vector by its marginals. A copula is a multivariate CDF with standard uniform marginal distributions which couples univariate distribution functions to generate a multivariate CDF and indicates the dependency structure of the random variables. Copulas are important parts of the study of dependency between variables since they allow us to separate the effect of dependency from the effects of the marginal distributions [Reference Joe9]. In recent years, there has been a revival of copulas in applications where the matter of dependency between random variables is of great importance [Reference Arias-Nicolás, Fernández-Ponce, Luque-Calvo and Suárez-Llorens2].
The fundamental theorem for copulas was introduced by Sklar [Reference Sklar14] and illustrates the role that copulas play in the relationship between multivariate CDFs and their univariate marginals. In an n-dimensional multivariate case, Sklar’s theorem states that if $F_{T_1,T_2,\ldots,T_n}$ is an n-dimensional CDF with marginals $F_{T_1},F_{T_2},\ldots,F_{T_n}$ , then there exists an n-copula $C\colon I^{n}\longrightarrow I$ such that
where $I=[0,1]$ . If $F_{T_1},F_{T_2},\ldots,F_{T_n}$ are continuous, the n-copula C is unique; otherwise, C is uniquely determined on the range of $F_{T_1}$ $\times$ the range of $F_{T_2}$ $\times\cdots\times$ the range of $F_{T_n}$ . Conversely, if C is an n-copula and $F_{T_1},F_{T_2},\ldots,F_{T_n}$ are univariate distribution functions, then $F_{T_1,T_2,\ldots,T_n}$ is a joint CDF with marginals $F_{T_1},F_{T_2},\ldots,F_{T_n}$ .
For any n-copula function C, there exists a corresponding copula density function c:
Therefore, if $f_{T_1,T_2,\ldots,T_n}$ , $f_{T_1},f_{T_2},\ldots f_{T_n}$ , and c are the density functions of $F_{T_1,T_2,\ldots,T_n}$ , $F_{T_1},F_{T_2},\ldots F_{T_n}$ , and C, respectively, the relation in (8) yields
where $u_{1},u_{2},\ldots ,u_{n}$ are related to $t_{1},t_{2},\ldots ,t_{n}$ through the marginal distribution functions $u_{1}=F_{T_1}(t_{1})$ , $u_{2}=F_{T_2}(t_{2})$ , …, $u_{n}=F_{T_n}(t_{n})$ .
Let us recall the definition of one of the most popular copulas, the multivariate Gaussian copula, which we consider here.
Definition 1. The n-dimensional Gaussian copula with covariance matrix $\boldsymbol{\Sigma}$ is defined by
where $\Phi_{\boldsymbol{\Sigma}}$ denotes the CDF of the n-variate normal random vector with mean vector $\mathbf{0}$ and covariance matrix $\boldsymbol{\Sigma}$ , $\Phi^{-1}$ is the inverse of the univariate standard Gaussian CDF, and $0\leq u_{1},u_{2},\ldots ,u_{n}\leq 1$ .
In this paper we consider the special version of the n-dimensional Gaussian copula with
and $-1/(n-1)<\rho<1$ in which $\mathbf{1}_{n}=(1,1,\ldots,1)_{1\times n}^\top$ . Thus, from (9), the n-dimensional Gaussian copula density is given by
where $\phi_{\boldsymbol{\Sigma}}$ is the PDF of the n-variate Gaussian distribution, and $z_{i}=\Phi^{-1}(u_{i})$ , $i=1,2,\ldots,n$ . Since
we have
Now, due to the fact that $\big(\sum_{i=1}^{n}z_{i}\big)^{2}=\sum_{i=1}^{n}z_{i}^{2}+\sum_{i\neq j}z_{i}z_{j}$ , substituting (13) into (12) yields
where
Remark 1. Note that setting $\boldsymbol{\Sigma}=\mathbf{I}_{n}$ , i.e. $\rho=0$ , in (11) leads to the independent copula $C_{\mathbf{I}_{n}}(u_{1},u_{2},\ldots ,u_{n}) = u_{1}u_{2}\cdots u_{n}$ , which is equivalent to the random variables $T_{1},T_{2},\ldots,T_{n}$ being independent.
A particular case of the n-dimensional Gaussian copula is the bivariate Gaussian copula. If we put $n=2$ and
with $-1<\rho<1$ , then the bivariate Gaussian copula is defined by
where $\rho\in(\!-1,1)$ is the Gaussian copula parameter and $\Phi_{2}$ is the bivariate standard Gaussian CDF. The Gaussian copula density for $-1<\rho<1$ is obtained as
3. The general case
Consider the additive Gaussian noise channel model (1). Let $\mathbf{X}$ and $\mathbf{W}_{t}$ be two dependent random vectors with a differentiable joint PDF $f_{\mathbf{X},\mathbf{W}_{t}}(\mathbf{x},\mathbf{w}_{t})$ . Then, for the PDF of $\mathbf{Y}$ , we obtain
where
First, recall that assuming $\mathbf{X}$ and $\mathbf{W}_{t}$ are independent random vectors and $\Sigma_{\mathbf{W}_{t}}=t\mathbf{I}_{m}$ , [Reference Costa7, Reference Villani17] used the heat equation given by
in their proofs. We now need to generalize this heat equation to the case of multivariate random vectors, as below.
Lemma 1. Suppose that $\mathbf{W}_{t}$ in channel model (1) has the covariance matrix (2), and let $\mathbf{X}$ and $\mathbf{W}_{t}$ be two dependent random vectors whose dependence structure is modeled by the multivariate Gaussian copula (14). Then, we have
where
Proof. Using (10) and (14), by setting $\mathbf{T}=(\mathbf{X},\mathbf{W}_{t})$ and $n=2m$ , we have
where
in which
because $W_{t,k}$ , $k=1,2,\ldots,m$ , are normally distributed with zero mean and variance t. Thus,
By some easy calculations, this expression can be rewritten as (17).
Lemma 2. Based on the same assumptions as in Lemma 1, we have
in which $q(\mathbf{y};\ t) = (q_{1}(\mathbf{y};\ t),q_{2}(\mathbf{y};\ t),\ldots,q_{m}(\mathbf{y};\ t))$ and
where $p_{j}(\mathbf{y};\ t) = {{{{\bf E}}}}_{\mathbf{X}\mid \mathbf{Y}}\big[ \sum_{i=1}^{m}\Phi^{-1}(F_{X_{i}}(X_{i})) + ({1}/{\sqrt{t}})\sum_{k\neq j}(Y_{k}-X_{k}) \mid \mathbf{Y} = \mathbf{y}\big]$ .
Proof. According to Lemma 1, differentiating (17) with respect to t and $y_{j}$ yields
respectively. Thus, for the second-order derivative of (17) with respect to $y_{j}$ , we obtain
Now, according to (16) and (19), we have
Thus, due to (19), by combining (22) with (23), we obtain
where
Therefore, using (20), the proof is complete.
Now, we need to derive the first- and second-order derivatives of the differential entropy $h(\mathbf{Y})$ that are key instruments in establishing our main result.
Theorem 1. Based on Lemma 2, the first-order derivative of the entropy $h(\mathbf{Y})$ is derived as
where
Proof. Using (18), we obtain
To apply Green’s identity (6) to the second term in (25), we assume that $V_{r}$ is the $m-$ sphere of radius r centered at the origin with boundary $S_{r}=\partial V_{r}$ . Now, we apply Green’s identity to the second term in (25) with $\phi(\mathbf{y})=\log f_{\mathbf{Y}}(\mathbf{y};\ t)$ and $\psi(\mathbf{y})=f_{\mathbf{Y}}(\mathbf{y};\ t)$ , and then take the limit on both sides as $r\rightarrow +\infty$ . Thus,
where $\mathbf{n}_{S_{r}}$ is the unit vector normal in the surface $S_{r}$ . Consider the identity
where $\mathbf{F}\colon\Bbb{R}^{m}\rightarrow \Bbb{R}^{m}$ . We set $\mathbf{F}(\mathbf{y})=q(\mathbf{y};\ t)$ and $\phi(\mathbf{y})=\log f_{\mathbf{Y}}(\mathbf{y};\ t)$ , and then, using Stokes’ theorem (8) and taking limits on both sides as $r\rightarrow +\infty$ , we get
In Appendix A, the surface integrals in (26) and (28) over the surface $S_{r}$ are shown to vanish as r approaches $+\infty$ . Therefore, by substituting (26) and (28) into (25), the theorem is proved.
Remark 2. Note that, in Theorem 1, from (24) with $\rho=0$ , we obtain
That is, the first-order derivative of the entropy $h(\mathbf{Y})$ reduces to the case when $\mathbf{X}$ and $\mathbf{W}_{t}$ are independent random vectors with $\Sigma_{\mathbf{W}_{t}}=tI_{m} $ as in [Reference Costa7].
According to Theorem 1, to provide the second-order derivative of $h(\mathbf{Y})$ , it is sufficient to derive the first-order derivative of the Fisher information $J(\mathbf{Y})$ . First, we need the following lemma.
Lemma 3. According to Lemma 2, the following two equations hold:
where
and
Proof. Simply, we know that
Also, from (18), we can write
which implies (29). To prove (30), we have
Also, since $\nabla. q(\mathbf{y};\ t)=\sum_{j=1}^{m}(\partial/\partial y_{j})q_{j}(\mathbf{y};\ t)$ , (31) is obtained. Now, to prove (32), we obtain
where
together with (33), this completes the proof.
Theorem 2. Under the conditions of Lemma 2, the first-order derivative of the Fisher information $J(\mathbf{Y})$ is as follows:
where
Proof. According to the Fisher information (4), we know that
Based on Lemma 2, the first term in (35) is expressed as
By applying Green’s identity (6) to the first term in (36) and taking the limit as r tends to $+\infty$ , we obtain
Similarly, using Green’s identity for the second term in (37) and taking the limit, we have
The first terms in (37) and (38) can be shown to vanish (see Appendix B), and therefore, by comparing (37) with (38), we can write
Substituting this into (36) yields
Also, by using (29) in Lemma 3, the second term in (35) can be rewritten as
Now, according to (30), we obtain
Therefore, from this and (38), we have
Thanks to the identity (31), for the second term in (40) we obtain
Using Green’s identity, we arrive at
whose first term becomes zero (see Appendix B). Using the identity
the second term in (43) is rewritten as
By combining this with (42) and (43), we get
Also, we have
whose first term vanishes (see Appendix B), and $p(\mathbf{y};\ t)=(p_{1}(\mathbf{y};\ t),p_{2}(\mathbf{y};\ t)\ldots,p_{m}(\mathbf{y};\ t))$ . From (45), combining (35), (39), (40), (41), and (44), we obtain
Hence, based on the relation (32) in Lemma 3, the proof is complete.
Remark 3. It is interesting to see that, if we put $\rho=0$ in (34), it reduces to
That is, Theorem 2 results in the case where $\mathbf{X}$ and $\mathbf{W}_{t}$ are independent random variables as a special case. Hence, Theorem 2 encompasses the result of [Reference Villani17] as a corollary.
Now, we can establish our main result of this manuscript.
Theorem 3. Let $\mathbf{X}$ and $\mathbf{W}_{t}$ in channel model (1) be two dependent random variables whose dependence structure is modeled by the multivariate Gaussian copula. For any $\rho>-1/(2m-1)$ , under the conditions
the entropy power $N(\mathbf{X}+\mathbf{W}_{t})$ is concave in t. i.e.
Proof. Simply, we have
Since the entropy power is nonnegative, to show that $(\partial^{2}/\partial t^{2})N(\mathbf{Y})\leq 0$ , it is sufficient to prove that
Based on Theorem 1, this is equivalent to
Thus, since $\rho>-1/(2m-1)$ and $\delta(\rho ,m)>0$ , due to the condition (46a), we must prove that
According to proof of the proposition in [17, p. 3], we have
Hence, according to Theorem 3, (47), and assumption (46b), the proof is complete.
4. The one-dimensional case
In this section, by considering the channel model (1) with $m=1$ , we describe special versions of our main results.
Corollary 1. Let X and $W_{t}$ in the channel model $Y=X+W_{t}$ be dependent one-dimensional random variables, and let $W_{t}$ be normally distributed with mean zero and variance t. If their dependence structure is modeled by the bivariate Gaussian copula (15), then
where
in which $p'(y;\ t)={{{{\bf E}}}}_{X\mid Y}[\Phi^{-1}(F_{X}(X))\mid Y=y]$ .
Proof. Since $W_{t}$ is normally distributed with mean zero and variance t, from (15),
Thus, by some simple calculations, we obtain
Now, by comparing (49) with (50), we obtain
in which
where $p'(y;\ t)={{{{\bf E}}}}_{X\mid Y}[\Phi^{-1}(F_{X}(X))\mid Y=y]$ . Hence, $q_{j}(\mathbf{y};\ t)$ and $p_{j}(\mathbf{y};\ t)$ in Lemma 2 reduce to $q'(y;\ t)$ and $p'(y;\ t)$ , respectively. Now, since X and $W_{t}$ are one-dimensional, it is sufficient to set $m=1$ and $p_{j}(\mathbf{y};\ t)=p'(y;\ t)$ in (24). Therefore, the proof is complete.
Remark 4. Corollary 1 is equivalent to a result in [Reference Khoolenjani and Alamatsaz12].
Now, under the same conditions as in Corollary 1, according to the relations (51) and (52), the first-order derivative of the Fisher information,
simply follows by setting $m=1$ and $p_{j}(\mathbf{y};\ t)=p'(y;\ t)$ in (34). This coincides with the result in [Reference Asgari, Alamatsaz and Khoolenjani4], where a direct proof of (53) is provided.
Using the first-order derivatives of the entropy and Fisher information of the output signal Y, in what follows the concavity of Shannon’s entropy power for the special one-dimensional case is obtained.
Corollary 2. Given the channel model (1), assume that X and $W_{t}$ are dependent random variables modeled by the bivariate Gaussian copula (14). Based on the assumptions
the entropy power $N(X+W_{t})$ is concave in t.
Example 1. Consider the channel model $Y=X+W_{t}$ with $W_{t}=\sqrt{t}W$ . Let X be standard Gaussian and suppose that X and $W_{t}$ are jointly distributed according to the bivariate Gaussian copula, i.e. X and W are two dependent random variables distributed according to a bivariate standard Gaussian distribution with the PDF
We know that Y is normally distributed with mean zero and variance $1+t+2\sqrt{t}\rho$ . Thus, since $(X,Y)\sim N_{2}(\mathbf{0},\Sigma_{X,Y})$ with
we have
Further, we observe that
Thus, by (48), we can write
As we can see, both conditions (54a) and (54b) are satisfied when $\rho>0$ . Thus, based on Corollary 2, $N(X+W_{t})$ is concave in t.
5. Conclusions
In this paper, based on the multivariate Gaussian copula dependence structure, we have derived the first- and second-order derivatives of differential entropy of the output signal in the m-dimensional additive Gaussian noise channel model. Then, by using these derivatives, we have generalized Costa’s concavity inequality for the particular case where the coordinates of the input signal and noise are dependent according to a multivariate Gaussian copula model. In particular, we have studied our results in the one-dimensional case and have provided an illustrative example.
Appendix A. Vanishing surface integrals of Theorem 1
We need to prove that
We first assume that $h(\mathbf{Y})$ is finite. Next, we integrate the surface integral in (55) over $r\geq 0$ and then, by applying the identity (27) and Stokes’ theorem, we obtain
Since the limit in the first part of (56) exists, due to
the first term in (56) vanishes. Now, since
for the second term in (56) we can write
Further, we know that
On the other hand, from (20), we have
Now, since for all $j=1,2,\ldots,m$ , $\vert E(W_{t,j}\mid \mathbf{Y}=\mathbf{y})\vert<+\infty$ , the first and second terms in (59) must be finite too. Therefore, we have
and, due to (58), the right-hand side of inequality (57) is finite. Hence, the integral in (56) is finite and, since the limit in (55) exists, the desired result (55) is proved.
Now, we need to prove that
in which the integral is taken from $r=0$ to $r=+\infty$ on the surface integral. Thus, we have
Since $f_{\mathbf{Y}}(\mathbf{y};\ t)$ converges to zero as $\mathbf{y}$ approaches $\pm\infty$ , we have $f_{\mathbf{Y}}(\mathbf{y};\ t)\log f_{\mathbf{Y}}(\mathbf{y};\ t)\rightarrow 0$ as $\mathbf{y}\rightarrow\pm\infty$ . Therefore, $\log f_{\mathbf{Y}}(\mathbf{y};\ t)$ is finite and, due to (59), the right-hand side of (62) becomes finite. Hence, since the limit in (61) exists, we can conclude the relation in ( $61$ ).
Appendix B. Vanishing surface integrals of Theorem 2
We intend to prove that
First, we consider the integral of the surface integral in (63) over $r\geq 0$ ;
Simply, based on (58) and (60), the right-hand side of (65) becomes finite and, since the limit $u_{1}$ exists, this proves that $u_{1}=0$ .
To show that $u_{2}=0$ , we write
Because $\big\vert\int_{0}^{+\infty}\int_{S_{r}}{f_{\mathbf{Y}}(\mathbf{y};\ t) \|\nabla\log f_{\mathbf{Y}}(\mathbf{y};\ t)\|^{2}\,\mathrm{d} S_{r}}\,\mathrm{d} r\big\vert =\vert J(\mathbf{Y})\vert<+\infty$ and
the first term in (66) becomes zero and the absolute value of the second term is finite. Thus, since the limit $u_{2}$ exists, we have $u_{2}=0$ .
In a similar way, we consider the integral from $r=0$ to $r=+\infty$ of the surface integral in (64):
Using (21), we have
Also, from (20), we obtain
Since, for all $j=1,2,\ldots,m$ , ${\bf E}(W_{t,j}^{2}\mid \mathbf{Y}=\mathbf{y})<+\infty$ , the first, third, and fourth terms in (68) are finite too and, due to (69), $(\partial/\partial y_{j})q_{j}(\mathbf{y};\ t)$ is finite as well. Therefore, from (59), the right-hand side of (67) is finite and, together with the fact that the limit $u_{3}$ exists, it follows that $u_{3}=0$ .
Similarly, to show that $u_{4}=0$ , we find the sequence of relations
Using similar steps, we can see that $u_{4}=0$ .
Acknowledgements
We express our gratitude to the associate editor and the anonymous reviewers whose comments had a noticeable impact on improving the manuscript.
Funding information
There are no funding bodies to thank relating to the creation of this article.
Competing interests
There were no competing interests to declare which arose during the preparation or publication process of this article.