The Bivariate Normal Distribution is an often used multivariable distribution because it provides a simple model of correlated random variables. Here it is derived by application of a linear transformation and a multivariate change of variables to the distribution of two independent unit normal, Normal(0, 1)\textbf{Normal}(0,\ 1), random variables. To provide background a general expression for change of variables of a bivariate integral is discussed and then used to obtain the Bivariate Normal Distribution. The Marginal and Conditional distributions are next computed and used to evaluate the first and seconds moments, correlation coefficient and conditional expectation and conditional variance. Finally, the variation in the shape of the distribution and transformation as the distribution parameters are varied is discussed.

Bivariate Change of Variables

Consider the PDF of a single variable, f(x)f(x), and the transformation, x=x(y)x=x(y) which is assumed monotonically increasing. The PDF of the transformed variable is given by,

g(y)=f(x(y))dxdy. g(y) = f(x(y)) \frac{dx}{dy}.

This result follows by performing a change of variables to the CDF,

P(X  x)=xf(w)dw=yf(x(w))dxdwdw=P(Y  y) \begin{aligned} P(X\ \leq\ x) &= \int_{-\infty}^{x} f(w) dw \\ &= \int_{-\infty}^{y} f(x(w)) \frac{dx}{dw} dw \\ &= P(Y\ \leq\ y) \end{aligned}

where use was made of,

dx=dxdydy. dx = \frac{dx}{dy} dy.

The dx/dydx/dy term scales dydy appropriately to conserve the differential length. In two dimensions a similar but more complicated thing happens.

Consider the bivariate PDF, f(x,y)f(x,y), with CDF,

P({X,Y}A)=Af(x,y)dA,     (1) P(\{X,Y\} \in A) = \int_{A} f(x, y) dA,\ \ \ \ \ (1)

which defines an integration over a region computing the probability that XX and YY are both in the region AA. The figure below provides an illustration for a Cartesian Coordinate System where dA=dxdydA=dxdy.

To go further a geometric result relating the Cross Product of two vectors to the area of the parallelogram enclosed by the vectors is needed. This topic is discussed in the following section.

Cross Product as Area

Consider two vectors A\textbf{A} and B\textbf{B} separated by an angle θ\theta and rotated by and angle ϕ\phi as shown in the following figure.

The vector components projected along the x\textbf{x} and y\textbf{y} unit vectors is given by,

A=Axx+AyyB=Bxx+Byy, \begin{aligned} \textbf{A} &= A_{x}\textbf{x} + A_{y}\textbf{y} \\ \textbf{B} &= B_{x}\textbf{x} + B_{y}\textbf{y}, \end{aligned}

where,

Ax=AcosϕAy=AsinϕBx=Bcos(ϕ+θ)By=Bsin(ϕ+θ).     (2) \begin{aligned} A_{x} &= \mid \textbf{A} \mid \cos{\phi} \\ A_{y} &= \mid \textbf{A} \mid \sin{\phi} \\ B_{x} &= \mid \textbf{B} \mid \cos{\left(\phi + \theta\right)} \\ B_{y} &= \mid \textbf{B} \mid \sin{\left(\phi + \theta\right)}. \\ \end{aligned}\ \ \ \ \ (2)

The cross product of two vectors is another vector perpendicular to both vectors. For the figure above, that direction is perpendicular to the plane of the page, call it z\textbf{z}. The cross product is then defined by the determinate computed from the components of the two vectors and projected along z\textbf{z},

A×B=AxAyBxByz=(AxByAyBx)z.     (3) \begin{aligned} \textbf{A} \times \textbf{B} &= \begin{vmatrix} A_{x} & A_{y} \\ B_{x} & B_{y} \end{vmatrix}\textbf{z} \\ &= \left( A_{x}B_{y} - A_{y} B_{x} \right)\textbf{z}. \end{aligned}\ \ \ \ \ (3)

Substituting equation (2)(2) into (3)(3) and making use of,

sin(θ+ϕ)=sinθ cosϕ + cosθ sinϕcos(θ+ϕ)=cosθ cosϕ  sinθ sinϕsin2ϕ + cos2ϕ=1, \begin{gathered} \sin{\left(\theta + \phi\right)} = \sin{\theta}\ \cos{\phi}\ +\ \cos{\theta}\ \sin{\phi} \\ \cos{\left(\theta + \phi\right)} = \cos{\theta}\ \cos{\phi}\ -\ \sin{\theta}\ \sin{\phi} \\ \sin^2{\phi}\ +\ \cos^2{\phi} = 1, \end{gathered}

results in,

A×B=AxByBxAy=AB[cosϕ sin(ϕ+θ)sinϕ cos(ϕ+θ)]=AB(cos2ϕ sinθ+sinϕ cosϕ cosθsinϕ cosϕ cosθ+sin2ϕ sinθ)=ABsinθ(cos2ϕ +sin2ϕ )=ABsinθ, \begin{aligned} \mid \textbf{A} \times \textbf{B} \mid &= \mid A_{x} B_{y} - B_{x} A_{y} \mid \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\left[\cos{\phi}\ \sin{\left(\phi + \theta\right)} - \sin{\phi}\ \cos{\left(\phi + \theta\right)}\right] \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\left(\cos^{2}{\phi}\ \sin{\theta} + \sin{\phi}\ \cos{\phi}\ \cos{\theta} - \sin{\phi}\ \cos{\phi}\ \cos{\theta} + \sin^2{\phi}\ \sin{\theta}\right) \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\sin{\theta}\left(\cos^{2}{\phi}\ + \sin^2{\phi}\ \right) \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid \sin{\theta}, \end{aligned}

which is the area of the parallelogram indicated by orange in the figure above. sinθ\sin\theta can be assumed positive since θ\theta can always be chosen such that 0  θ  π0\ \leq\ \theta\ \leq\ \pi.

Bivariate Jacobian

Consider the following coordinate transformation between two coordinate systems defined by the variables (x,y)\left(x,y\right) and (u,v)\left(u,v\right),

x=x(u,v)y=y(u,v).     (4) \begin{aligned} x &= x(u,v) \\ y &= y(u,v). \end{aligned}\ \ \ \ \ (4)

applied to the integral,

Af(x,y)dxdy, \int_{A} f(x, y) dxdy,

where AA is an arbitrary area in (x,y)\left(\textit{x}, \textit{y}\right) coordinates. The following figure shows how area elements transform between (u,v)\left(\textit{u}, \textit{v} \right) coordinates and (x,y)\left(\textit{x}, \textit{y} \right) coordinates when equation (4)(4) is applied. The left side of the figure shows the (u,v)\left(\textit{u}, \textit{v} \right) Cartesian Coordinate System. In this system vertical lines of constant u\textit{u} are colored orange and horizontal lines of constant v\textit{v} blue. The right side of the figure illustrates how equation (4)(4) maps lines of constant uu and vv onto (x,y)\left(x,y\right) coordinates. The lines of constant u\textit{u} and v\textit{v} can be curved and not aligned with the (x,y)\left(\textit{x}, \textit{y}\right) Cartesian coordinates as shown in the figure. The transformed (u,v)\left(\textit{u}, \textit{v} \right) coordinates in this case are Curvilinear.

Consider the differential area elements indicated by orange in the figure. In the (u,v)\left(\textit{u}, \textit{v} \right) Cartesian coordinates the differential area element is given by dA=dudvdA=dudv but in (x,y)\left(\textit{x}, \textit{y}\right) coordinates differentials defining the area are not orthogonal so the element is distorted. In the infinitesimal limit it will become a parallelogram with area given by the cross product of dXd\textbf{X} and dYd\textbf{Y} which are tangent vectors to the curves of constant v\textit{v} and u\textit{u} respectively at the origin point of the vectors. To compute the cross product expressions for the differentials are required. The components of the differentials are computed from transformation (4)(4) by assuming v\textit{v} is constant for dXd\textbf{X} and u\textit{u} is constant for dYd\textbf{Y}. It follows that,

dX=xudu x+yudu ydY=xvdv x+yvdv y. \begin{aligned} d\textbf{X} &= \frac{\partial x}{\partial u}du\ \textbf{x} + \frac{\partial y}{\partial u}du\ \textbf{y} \\ d\textbf{Y} &= \frac{\partial x}{\partial v}dv\ \textbf{x} + \frac{\partial y}{\partial v}dv\ \textbf{y}. \end{aligned}

The cross product of the differentials above is given by,

dX×dY=xuduyuduxvdvyvdv z=xuyuxvyvdudv z=(xuyvyuxv)dudv z. \begin{aligned} d\textbf{X} \times d\textbf{Y} &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}}\small{du} & \frac{\small{\partial y}}{\small{\partial u}}\small{du} \\ \frac{\small{\partial x}}{\small{\partial v}}\small{dv} & \frac{\small{\partial y}}{\small{\partial v}}\small{dv} \end{vmatrix}\ \textbf{z} \\ &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}} & \frac{\small{\partial y}}{\small{\partial u}} \\ \frac{\small{\partial x}}{\small{\partial v}} & \frac{\small{\partial y}}{\small{\partial v}} \end{vmatrix}dudv\ \textbf{z} \\ &= \left(\frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v} \right)dudv\ \textbf{z}. \end{aligned}

The bivariate Jacobian is defined by,

J=(x,y)(u,v)=xuxvyuyv=(xuyvyuxv). \begin{aligned} \mid J \mid &= \left| \frac{\partial\left(x, y\right)}{\partial\left(u, v\right)} \right| \\ &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}} & \frac{\small{\partial x}}{\small{\partial v}} \\ \frac{\small{\partial y}}{\small{\partial u}} & \frac{\small{\partial y}}{\small{\partial v}} \end{vmatrix} \\ &= \left(\frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v} \right). \end{aligned}

In general the determinate of a matrix equals the determinate of the transpose of the matrix. It follows that the area element in (u,v)\left(\textit{u}, \textit{v} \right) coordinates is given by,

dA=dX×dY=Jdudv. dA = \mid d\textbf{X} \times d\textbf{Y} \mid = \mid J \mid dudv.

J\mid J \mid will scale the differential area dudvdudv by the amount required by the transform to conserve the area in a manner similar to that seen for a single variable length is conserved.

Finally the transform of the bivariate integral in equation (1)(1) is given by,

Af(x,y)dxdy=Af(x(u,v),y(u,v))Jdudv, \int_{A} f(x, y) dxdy = \int_{A'} f(x(u,v), y(u, v)) \mid J \mid dudv,

where AA' is the region obtained by applying (4)(4) to AA.

Bivariate Normal Distribution

The Bivariate Normal Distribution with random variables UU and VV is obtained by applying the following linear transformation to two independent Normal(0,1)\textbf{Normal}(0, 1) random variables XX and Y,Y,

(UμuVμv)=(σu0σvγσv1γ2)(XY),     (5) \begin{pmatrix} U - \mu_{u} \\ V - \mu_{v} \end{pmatrix} = \begin{pmatrix} \sigma_{u} & 0 \\ \sigma_{v}\gamma & \sigma_{v}\sqrt{1 - \gamma^{2}} \end{pmatrix} \begin{pmatrix} X \\ Y \end{pmatrix},\ \ \ \ \ (5)

where μu\mu_{u}, μv\mu_{v}, σu\sigma_{u}, σv\sigma_{v} and γ\gamma are scalar constants named in anticipation of being the mean, standard deviation and correlation coefficient of the distribution.

An equation for XX and YY in terms UU and VV is obtained by inverting the transformation matrix,

(XY)=1σuσv1γ2(σv1γ20σvγσu)(UμuVμv),    (6) \begin{pmatrix} X \\ Y \end{pmatrix} = \frac{1}{\sigma_{u}\sigma_{v}\sqrt{1 - \gamma^2}} \begin{pmatrix} \sigma_{v}\sqrt{1 - \gamma^{2}} & 0 \\ -\sigma_{v}\gamma & \sigma_{u} \end{pmatrix} \begin{pmatrix} U - \mu_{u} \\ V - \mu_{v} \end{pmatrix},\ \ \ \ (6)

where it is assumed that γ21.\gamma^{2} \ne 1. The distribution for XX and YY is found by making use of the assumption that they are independent Normal(0,1)\textbf{Normal}(0, 1), namely,

f(x,y)=f(x)f(y)=12πe(x2+y2)/2.     (7) f(x,y) = f(x)f(y) = \frac{1}{2\pi}e^{-(x^2 + y^2) / 2}.\ \ \ \ \ (7)

In the previous section it was shown that for transformation ,

x=x(u,v)y=y(u,v), \begin{aligned} x &= x(u, v) \\ y &= y(u, v), \end{aligned}

the transformed PDF is given by,

g(u,v)=Jf(x(u,v),y(u,v)),     (8) g(u, v) = \mid J \mid f(x(u,v), y(u,v)),\ \ \ \ \ (8)

where J\mid J \mid is the Jacobian. The transformations x(u,v)x(u,v) and y(u,v)y(u,v) can be determined from equation (6)(6),

x(u,v)=(uμu)σuy(u,v)=11γ2[1σv(vμv)γσu(uμu)].     (9) \begin{aligned} x(u,v) & = \frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \\ y(u,v) & = \frac{1}{\sqrt{1-\gamma^2}} \left[ \frac{1}{\sigma_{v}}\left(v-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(u-\mu_{u}\right) \right]. \end{aligned}\ \ \ \ \ (9)

It follows that the Jacobian is given by,

J=1σuσv1γ2.     (10) \mid J \mid = \frac{1}{\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}.\ \ \ \ \ (10)

With the goal of keeping things simple in the evaluation of equation (8)(8) the exponential argument term of equation (7)(7), x2+y2,x^2 + y^2, will first be considered. Use of the expressions for x(u,v)x(u,v) and y(u,v)y(u,v) from equation (9)(9) gives,

x2+y2=[x(u,v)]2+[y(u,v)]2=(uμu)2σu2+11γ2[1σv(vμv)γσu(uμu)]2=(uμu)2σu2+γ21γ2(uμu)2σu2+11γ2(vμv)2σv22γ1γ2(uμu)(vμv)σuσv=11γ2[(uμu)2σu2(1γ2+γ2)(vμv)2σv22γ(vμv)σv(uμu)σu]=11γ2[(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu].     (11) \begin{aligned} x^2 + y^2 &= \left[x(u,v)\right]^{^2} + \left[y(u,v)\right]^{^2} \\ &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{1}{1-\gamma^2}\left[\frac{1}{\sigma_{v}}\left(v-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(u-\mu_{u}\right)\right]^2 \\ &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\gamma^2}{1-\gamma^2}\frac{\left(u - \mu_u \right)^2}{\sigma_u^2} + \frac{1}{1-\gamma^2}\frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - \frac{2\gamma}{1-\gamma^2}\frac{\left(u - \mu_u \right)\left(v-\mu_v\right)}{\sigma_u\sigma_v} \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2}\left(1-\gamma^2+\gamma^2\right) \frac{\left(v-\mu_v\right)^2}{\sigma_{v}^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \right] \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \right]. \end{aligned}\ \ \ \ \ (11)

The first step expands the right most squared term and the last two steps aggregate (uμu)\left(u-\mu_{u} \right) and (vμv)\left(v-\mu_v\right) terms. The Bivariate Normal PDF follows by inserting equations (10)(10) and (11)(11) into equation (8)(8),

g(u,v)=Jf(x(u,v),y(u,v))=12πσuσv1γ2e12(1γ2)[(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu]     (12) \begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } \end{aligned}\ \ \ \ \ (12)

Extension of transform (5)(5) to more than two dimensions is not clear. The following section will describe a form of the transformation that makes this more apparent.

Matrix Form of Bivariate Normal Distribution

A matrix form for the Bivariate Normal Distribution PDF can be constructed that scales to higher dimensions. Here the version for two variables is compared with the results of the previous section and followed by a discussion of extension to an arbitrary number of dimensions. To begin consider the column vectors,

Y=(y1y2)μ=(μ1μ2)P=(σ12γσ1σ2γσ1σ2σ22).     (13) \begin{gathered} Y = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} \mu = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \\ P = \begin{pmatrix} \sigma_{1}^{2} & \gamma\sigma_{1}\sigma_{2} \\ \gamma\sigma_{1}\sigma_{2} & \sigma_{2}^{2} \end{pmatrix}. \\ \end{gathered}\ \ \ \ \ (13)

YY is the column vector of Bivariate Normal Random variables, μ\mu the column vector of mean values and PP is called the Covariance Matrix. To continue the inverse of the covariance matrix and its determinate are required. The inverse is given by,

P1=1σ12σ22(1γ2)(σ22γσ1σ2γσ1σ2σ12), P^{-1} = \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2} & -\gamma\sigma_{1}\sigma_{2} \\ -\gamma\sigma_{1}\sigma_{2} & \sigma_{1}^{2} \end{pmatrix},

and the determinate is,

P=σ12σ22(1γ2). \mid P \mid = \sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right).

Next, consider the product, where (Yμ)T\left( Y - \mu\right)^{T} is the transpose of (Yμ)\left( Y - \mu\right),

(Yμ)TP1(Yμ). \left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right).

The two rightmost terms evaluate to,

P1(Yμ)=1σ12σ22(1γ2)(σ22γσ1σ2γσ1σ2σ12)(y1μ1y2μ2)=1σ12σ22(1γ2)(σ22(y1μ1)γσ1σ2(y2μ2)γσ1σ2(y1μ1)+σ12(y2μ2)). \begin{aligned} P^{-1}\left(Y-\mu\right) &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2} & -\gamma\sigma_{1}\sigma_{2} \\ -\gamma\sigma_{1}\sigma_{2} & \sigma_{1}^{2} \end{pmatrix} \begin{pmatrix} y_1 -\mu_1 \\ y_2 -\mu_2 \end{pmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2}\left(y_1 -\mu_1\right) - \gamma\sigma_{1}\sigma_{2}\left(y_2 -\mu_2\right) \\ -\gamma\sigma_{1}\sigma_{2}\left(y_1 -\mu_1\right) + \sigma_{1}^{2}\left(y_2 -\mu_2\right) \end{pmatrix}. \end{aligned}

Continuing the evaluation by including the left most term gives,

(Yμ)TP1(Yμ)=1σ12σ22(1γ2)(y1μ1y2μ2)(σ22(y1μ1)γσ1σ2(y2μ2)γσ1σ2(y1μ1)+σ12(y2μ2))=1σ12σ22(1γ2){[σ22(y1μ1)γσ1σ2(y2μ2)](y1μ1)+[σ12(y2μ2)γσ1σ2(y1μ1)](y2μ2)}=1σ12σ22(1γ2)[σ22(y1μ1)2+σ12(y2μ2)22γσ1σ2(y2μ2)(y1μ1)]=11γ2[(y1μ1)2σ12+(y2μ2)2σ222γ(y1μ1)σ1(y2μ2)σ2]. \begin{aligned} \left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right) &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} y_1 -\mu_1 & y_2 -\mu_2 \end{pmatrix} \begin{pmatrix} \sigma_{2}^{2}\left(y_1 -\mu_1\right) - \gamma\sigma_{1}\sigma_{2}\left(y_2 -\mu_2\right) \\ -\gamma\sigma_{1}\sigma_{2}\left(y_1 -\mu_1\right) + \sigma_{1}^{2}\left(y_2 -\mu_2\right) \end{pmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \left\{ \left[\sigma_{2}^{2}\left(y_1 -\mu_{1}\right) - \gamma\sigma_{1}\sigma_{2}\left(y_{2} -\mu_{2}\right)\right] \left(y_{1}-\mu_{1} \right) + \left[\sigma_{1}^{2}\left(y_{2} -\mu_{2}\right) -\gamma\sigma_{1}\sigma_{2}\left(y_{1} -\mu_{1}\right)\right] \left(y_{2}-\mu_{2} \right) \right\} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \left[ \sigma_{2}^{2}\left(y_{1} -\mu_{1}\right)^{2} + \sigma_{1}^{2}\left(y_{2} -\mu_{2}\right)^{2} - 2\gamma\sigma_{1}\sigma_{2}\left(y_{2} -\mu_{2}\right)\left(y_{1}-\mu_{1} \right) \right] \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(y_{1}-\mu_{1} \right)^2}{\sigma_{1}^2} + \frac{\left(y_{2}-\mu_{2}\right)^2}{\sigma_{2}^2} - 2\gamma\frac{\left(y_{1}-\mu_{1}\right)}{\sigma_{1}}\frac{\left(y_{2}-\mu_{2}\right)}{\sigma_{2}} \right]. \end{aligned}

Which is the same as obtained using transform (9)(9) in the previous section. Comparing the expression for P\mid P\mid with equation (12)(12) gives,

J=1P. \mid J \mid = \frac{1}{\sqrt{\mid P \mid}}.

Putting things together produces the desired result,

g(u,v)=Jf(x(u,v),y(u,v))=12πPe[12(Yμ)TP1(Yμ)]. \begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{2\pi\sqrt{\mid P \mid}} e^{ \left[-\frac{1}{2}\footnotesize{\left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right)}\right] }. \end{aligned}

To extend to an arbitrary number of dimensions, nn, the covariance matrix defined in equation (13)(13) needs to written in a more general form,

Pij=Cov(Yi,Yj). P_{ij} = \text{Cov}(Y_{i}, Y_{j}).

After updating the normalization factor to account for the product of nn Normal(0,1)\textbf{Normal(0,1)} distributions the Multivariate Normal PDF becomes,

g(u,v)=Jf(x(u,v),y(u,v))=1(2π)n/2Pe[12(Yμ)TP1(Yμ)]. \begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{(2\pi)^{n/2}\sqrt{\mid P \mid}} e^{ \left[-\frac{1}{2}\footnotesize{\left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right)}\right] } \end{aligned}.

To determine the linear transformation, analogous to equation (9)(9), PijP_{ij} is factored using Cholesky Decomposition.

In a following section it will be shown that for two variables,

Cov(Yi,Yj)={γσiσji  jσi2i=j. \text{Cov}(Y_{i}, Y_{j}) = \begin{cases} \gamma\sigma_{i}\sigma_{j} & i\ \neq\ j \\ \sigma_{i}^{2} & i = j \end{cases}.

which is equivalent to equation (13)(13).

Bivariate Normal Distribution Properties

The previous sections discussed the derivation of the Bivariate Normal random variables as linear combinations of independent unit normal random variables. Linear combinations were constructed by application of a linear transformation that includes five independent parameters. In this section interpretations of the parameters are provided and variation in the distribution as the parameters are varied is described. First, the Marginal Distributions are calculated and it is shown that four of the distribution parameters correspond to marginal distribution means and standard deviations. Next, the Correlation Coefficient of the distribution is computed and shown to correspond to the remaining parameter. In remaining sections how changes in the parameters affect the distribution shape are considered. This includes an analysis of PDF contours and the linear transformation used to construct the distribution.

Marginal Distributions

The Marginal Distributions for uu and vv are defined by,

g(u)=g(u,v)dvg(v)=g(u,v)du, \begin{aligned} g(u) &= \int_{-\infty}^{\infty} g(u, v) dv \\ g(v) &= \int_{-\infty}^{\infty} g(u, v) du, \end{aligned}

where g(u,v)g(u,v) is defined by equation (12).(12). First, consider evaluation of the integral for g(u),g(u),

g(u)=g(u,v)dv=12πσuσv1γ2e12(1γ2)[(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu]dv=12πσuσv1γ2e12(1γ2)(uμu)2σu2e12(1γ2)[(vμv)2σv22γ(vμv)σv(uμu)σu]dv=12πσuσv1γ2e12(1γ2)(uμu)2σu2e12(1γ2){[(vμv)σvγ(uμu)σu]2γ2(uμu)2σu2}dv=12πσuσv1γ2e12(1γ2)(uμu)2σu2eγ22(1γ2)(uμu)2σu2e12(1γ2)[(vμv)σvγ(uμu)σu]2dv=12πσuσv1γ2e(uμu)22σu22πσv2(1γ2)=12πσu2e(uμu)22σu2. \begin{aligned} g(u) &= \int_{-\infty}^{\infty} g(u, v) dv \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}\int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{\sigma_{v}^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)}}{\footnotesize{\sigma_{v}}} - \gamma\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right]^{2} -\gamma^{2}\frac{\footnotesize{\left(u-\mu_u\right)^2}}{\footnotesize{\sigma_u^2}} \right\} }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } e^{ \frac{\footnotesize{\gamma^2}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)}}{\footnotesize{\sigma_{v}}} - \gamma\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right]^{2} }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ -\frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{2\sigma_{u}^2}} } \sqrt{\footnotesize{2\pi\sigma_{v}^{2}(1-\gamma^2)}}\\ &= \frac{1}{\sqrt{2\pi\sigma_{u}^{2}}}e^{ -\frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{2\sigma_{u}^2}} }. \end{aligned}

In the first step the uu dependence is factored out for evaluation of the integral over vv. The next step completes the square of the exponential followed by also factoring the introduced uu term from the vv integral. This is followed by simplification of the uu exponential argument. Finally, the integral over vv is evaluated yielding a Normal(μu,σu)\textbf{Normal}(\mu_{u}, \sigma_{u}) distribution. Similarly, the marginal distribution g(v)g(v) is given by,

g(v)=g(u,v)du=12πσv2e(vμv)22σv2,     (14) g(v) = \int_{-\infty}^{\infty} g(u, v) du = \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}}e^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} },\ \ \ \ \ (14)

which is a Normal(μv,σv)\textbf{Normal}(\mu_{v}, \sigma_{v}) distribution. The plot below shows examples of Normal(μ,σ)\textbf{Normal}(\mu, \sigma) as μ\mu and σ\sigma are varied. The effect of μ\mu is to translate the distribution along the uu axis and σ\sigma scales the distribution width.

The mean and variance are now readily determined for both uu and vv,

E[U]=ug(u,v)dvdu=ug(u,v)dvdu=ug(u)dvdu=μu, \begin{aligned} \text{E}[U] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} ug(u, v) dvdu \\ &= \int_{-\infty}^{\infty}u\int_{-\infty}^{\infty} g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}ug(u) dvdu \\ &= \mu_{u}, \end{aligned}

and,

Var[U]=(uμu)2g(u,v)dvdu=(uμu)2g(u,v)dvdu=(uμu)2g(u)dvdu=σu2. \begin{aligned} \text{Var}[U] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} (u -\mu_{u})^2 g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}(u -\mu_{u})^2 \int_{-\infty}^{\infty} g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}(u -\mu_{u})^2g(u) dvdu \\ &= \sigma_{u}^{2}. \end{aligned}

Similarly for vv,

E[V]=vg(u,v)dudv=μvVar[V]=(vμv)2g(u,v)dudv=σv2. \begin{aligned} \text{E}[V] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} vg(u, v) dudv = \mu_{v} \\ \text{Var}[V] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} (v -\mu_{v})^2 g(u, v) dudv = \sigma_{v}^{2}. \end{aligned}

Conditional Distribution

The Conditional Distribution of a random variable is the distribution obtained by assuming the values of one or more correlated random variables are known. If the variables are uncorrelated, independent, the conditional distribution is equivalent to the marginal distribution. The conditional distribution is useful in the of the calculation of Correlation Coefficient performed in the following section and in simulation methods such as Gibbs Sampling.

For the bivariate case the conditional distributions are defined by,

g(uv)=g(u,v)g(v)g(vu)=g(u,v)g(u) \begin{aligned} g(u|v) &= \frac{g(u, v)}{g(v)} \\ g(v|u) &= \frac{g(u, v)}{g(u)} \end{aligned}

Using equations (12)(12) and (14),(14), g(uv)g(u|v) is evaluated as follows,

g(uv)=g(u,v)g(v)=12πσuσv1γ2e12(1γ2)[(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu]2πσv2e(vμv)22σv2=1σu2π(1γ2)e12(1γ2)[(uμu)2σu22γ(vμv)σv(uμu)σu]e{(vμv)22σv2+12(1γ2)[(vμv)2σv2]}=1σu2π(1γ2)e12(1γ2){1σu[u(μu+γσuσv(vμv))]2γ2σv2σv2(vμv))2}e{(vμv)22σv212(1γ2)[(vμv)σv2]}=1σu2π(1γ2)e12(1γ2){1σu[u(μu+γσuσv(vμv))]2}e{(vμv)22σv2[11γ21γ2]}=1σu2π(1γ2)e12σu2(1γ2){u[μu+γσuσv(vμv)]}2.     (15) \begin{aligned} g(u|v) &= \frac{g(u, v)}{g(v)} \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } \sqrt{2\pi\sigma_{v}^{2}}e^{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} + \frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \frac{1}{\footnotesize{\sigma_{u}}} \left[ u - \left(\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right) \right]^2 - \frac{\footnotesize{\gamma^2\sigma_v^2}}{\footnotesize{\sigma_v^2}} \left(\footnotesize{v-\mu_{v}})\right)^2 \right\} } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} - \frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_v^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \frac{1}{\footnotesize{\sigma_{u}}} \left[ u - \left(\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right) \right]^2 \right\} } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} \left[ \footnotesize{1} - \frac{\footnotesize{1-\gamma^2}}{\footnotesize{1-\gamma^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u - \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 }. \end{aligned}\ \ \ \ \ (15)

In the first step the (vμv)2\left(v-\mu_{v}\right)^2 terms are collected and then the square is completed for the remaining terms. Once again the (vμv)2\left(v-\mu_{v}\right)^2 terms are collected and the finial result obtained, a Normal\textbf{Normal} distribution.

Similarly, g(vu)g(v|u) is given by,

g(vu)=g(u,v)g(u)=1σv2π(1γ2)e12σu2(1γ2){v[μv+γσvσu(uμu)]}2. \begin{aligned} g(v|u) &= \frac{g(u, v)}{g(u)} \\ &= \frac{1}{\sigma_{v}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ v - \left[\mu_v + \frac{\footnotesize{\gamma\sigma_v}}{\footnotesize{\sigma_u}} \left(\footnotesize{u-\mu_{u}}\right)\right] \right\}^2 }. \end{aligned}

The conditional mean and variance can now be easily evaluated,

E[UV]=ug(uv)du=[μu+γσuσv(vμv)], \begin{aligned} \text{E}[U|V] &= \int_{-\infty}^{\infty} ug(u|v) du \\ &=\left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right], \end{aligned}

E[VU]=vg(vu)dv=[μv+γσvσu(uμu)], \begin{aligned} \text{E}[V|U] &= \int_{-\infty}^{\infty} vg(v|u) dv \\ &= \left[\mu_v + \frac{\footnotesize{\gamma\sigma_v}}{\footnotesize{\sigma_u}} \left(\footnotesize{u-\mu_{u}}\right)\right], \end{aligned}

Var[UV]=E[(UE[UV])2]=(uE[UV])2g(uv)du=σu2(1γ2), \begin{aligned} \text{Var}[U|V] &= \text{E}[\left(U - \text{E}[U|V]\right)^{2}] \\ &= \int_{-\infty}^{\infty} \left(u - \text{E}[U|V]\right)^{2}g(u|v) du \\ &= \sigma_{u}^{2}(1-\gamma^2), \end{aligned}

Var[VU]=E[(VE[VU])2]=(vE[VU])2g(vu)dv=σv2(1γ2). \begin{aligned} \text{Var}[V|U] &= \text{E}[\left(V - \text{E}[V|U]\right)^{2}] \\ &= \int_{-\infty}^{\infty} \left(v - \text{E}[V|U]\right)^{2}g(v|u) dv \\ &=\sigma_{v}^{2}(1-\gamma^2). \end{aligned}

For both uu and vv the conditional expectation is a linear function of the conditioned variable. This is illustrated in the following plot where g(uv)g(u|v) is plotted for several values of vv. It is seen that vv translates the distribution along the uu axis.

Consider the variation of E[UV]\text{E}[U|V], E[VU]\text{E}[V|U], Var[UV]\text{Var}[U|V] and Var[UV]\text{Var}[U|V] with γ\gamma. For γ=0\gamma = 0 each reduces to the values corresponding to its marginal distribution discussed in the previous section. For both conditional distributions increasing γ\gamma leads to decreasing variance resulting in a sharper peak for the distribution. Additionally, changing the sign of γ\gamma causes reflection of the distribution about the mean. Each of these properties is illustrated on the plot below where g(uv)g(u|v) is plotted for values of γ\gamma ranging from 1-1 to 1.1.

Correlation Coefficient

The Cross Correlation of the two random variables UU and VV is defined by,

E[UV]=uvg(u,v)dvdu=uvg(uv)g(v)dvdu. \begin{aligned} \text{E}[UV] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u,v) dvdu \\ &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u|v)g(v) dvdu. \end{aligned}

The last step follows from the definition of conditional probability. Now, substituting equations (15)(15) and (14)(14) into the equation above yields,

E[UV]=uvg(uv)g(v)dvdu=uv1σu2π(1γ2)e12σu2(1γ2){u[μu+γσuσv(vμv)]}212πσv2e(vμv)22σv2dvdu=1σu2π(1γ2)12πσv2ve(vμv)22σv2ue12σu2(1γ2){u+[μuγσuσv(vμv)]}2dudv=12πσv2ve(vμv)22σv2[μu+γσuσv(vμv)]dv=μuμv+γσuσvσv2=μuμv+γσuσv. \begin{aligned} \text{E}[UV] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u|v)g(v) dvdu \\ &=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}uv \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u - \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 } \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}}e^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} }dvdu \\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}} \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}} \int_{-\infty}^{\infty}ve^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} } \int_{-\infty}^{\infty}ue^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u + \left[\mu_u - \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 }dudv\\ &= \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}} \int_{-\infty}^{\infty}ve^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} } \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] dv\\ &=\mu_{u}\mu_{v} + \frac{\gamma\sigma_{u}}{\sigma_{v}}\sigma_{v}^{2} \\ &=\mu_{u}\mu_{v} + \gamma\sigma_{u}\sigma_{v}. \end{aligned}

It follows that the Covariance is given by,

Cov(U,V)=E[UV]E[U]E[V]=γσvσu, \text{Cov}(U,V) = \text{E}[UV] - \text{E}[U]E[V] = \gamma\sigma_{v}\sigma_{u},

and Correlation Coefficient by,

γ=Cov(U,V)σvσu. \gamma = \frac{\text{Cov}(U,V)}{\sigma_{v}\sigma_{u}}.

Distribution Parameter Limits

In previous sections it was shown that the free parameters used in the definition of the Bivariate Normal distribution, equation (12)(12), are its the mean, variance and correlation coefficient. Here a sketch of the change in shape of the distribution as these parameters are varied is discussed by evaluating limits of the parameters for the equation defining the PDF contours. The following section will describe the convergence to the limits using a numerical analysis.

The equation satisfied by the contours is obtained by setting the argument of the exponential in equation (12)(12) to a constant, C2C^{2},

C2=(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu, C^{2} = \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}},

whereC2C^{2} is related to the value of the contour. To develop an expectation of the behavior the following limits of this equation are considered, γ0\gamma\to 0, γ±1\gamma\to \pm 1, σv/σu1\sigma_{v}/\sigma_{u}\to 1, σv/σu0\sigma_{v}/\sigma_{u}\to 0 and σv/σu\sigma_{v}/\sigma_{u}\to \infty. The variation with μu\mu_{u} and μv\mu_{v} produces a translation which is not as interesting.

First, consider the limit γ0\gamma\to 0 which is easily evaluated using the equation above,

C2=(uμu)2σu2+(vμv)2σv2. C^{2} = \frac{\left(u-\mu_{u} \right)^{2}}{\sigma_{u}^{2}} + \frac{\left(v-\mu_v\right)^2}{\sigma_{v}^{2}}.

This is the equation of an ellipse with axes of length 2Cσu2C\sigma_{u} and 2Cσv2C\sigma_{v}. The aspect ratio of the ellipse is given by σv/σu\sigma_{v}/\sigma_{u}. It follows that in the limit σv/σu1\sigma_{v}/\sigma_{u}\to 1 the contour approaches a circle with radius CC and in the limits σv/σu0\sigma_{v}/\sigma_{u}\to 0 or σv/σu\sigma_{v}/\sigma_{u}\to \infty the contour approaches a line along the uu or vv axes respectively.

To evaluate the limit γ±1\gamma\to \pm 1 it must be noted that γ21\gamma^{2}\ne 1 was assumed in the derivation of inverse of the Bivariate Transformation, equation (6)(6). To evaluate the limit it should be evaluated before inverting the transformation defining the distribution, equation (5)(5). Taking the limit results in a transformation that is valid for γ2=1\gamma^{2}= 1,

(UμuVμv)=(σu0±σv0)(XY). \begin{pmatrix} U - \mu_{u} \\ V - \mu_{v} \end{pmatrix} = \begin{pmatrix} \sigma_{u} & 0 \\ \pm\sigma_{v} & 0 \end{pmatrix} \begin{pmatrix} X \\ Y \end{pmatrix}.

Evaluation of the equation above gives,

(Vμv)=±σvσu(Uμu). \left(V-\mu_{v}\right) = \pm \frac{\sigma_{v}}{\sigma_{u}} \left(U-\mu_{u}\right).

It follows that as the limit is approached contours will approach a line with slope ±σv/σu\pm \sigma_{v} / \sigma_{u}. The slope is positive for positive correlation and negative for negative correlation. In the limit σv/σu1\sigma_{v} / \sigma_{u}\to 1 the contour slope approaches a line with slope ±1\pm 1 and in the limit σv/σu0\sigma_{v}/\sigma_{u}\to 0 or σv/σu\sigma_{v}/\sigma_{u}\to \infty the contour approaches a line along the uu or vv axes respectively which is the same as obtained in the limit γ0.\gamma\to 0.

The following two plots show the surface an contour plots for a distribution with σv/σu=1\sigma_{v}/\sigma_{u}=1 and γ=0,\gamma=0, which is the case where uu and vv are uncorrelated with equal variances. The contours are circles are previously described. The surface plot is included to show how poor it is at giving a sense of the distribution shape though it does assist in imagining the how the contour plot would be projected into the third dimension.

The next plot also has γ=0\gamma=0 but σv/σu=2\sigma_{v}/\sigma_{u}=2. The contours become ellipses with the axis aligned with the vv axis.

The final plot has γ=0.5\gamma=0.5 and σv/σu=1\sigma_{v}/\sigma_{u}=1. The contours are symmetric about the line v=uv=u as expected for the limit γ±1\gamma\to \pm 1. If the correlation were negative the contours would be reflected about the vv axis and symmetric about the line v=uv=-u.

Probability Density Contours

This section will describe a more detailed analysis of parameter limits of the Bivariate Normal PDF contours consisting of plots of a larger range of values for the parameters γ\gamma and σv/σu\sigma_{v}/\sigma_{u}.

The equation satisfied by the contours is obtained by setting the argument of the exponential in equation (12)(12) to a constant C2C^{2},

C2=(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu=(uμu)2σu2+(vμv)2σv22γ(vμv)σv(uμu)σu+γ2σv2(vμv)2γ2σv2(vμv)2=[(uμu)σuγσv(vμv)]2+(vμv)2σv2(1γ2). \begin{aligned} C^{2} &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \\ &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} + \frac{\gamma^{2}}{\sigma_{v}^{2}}\left(v-\mu_v\right)^{2} - \frac{\gamma^{2}}{\sigma_{v}^{2}}\left(v-\mu_v\right)^{2} \\ &= \left[\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} - \frac{\gamma}{\sigma_{v}}\left(v-\mu_v\right)\right]^{2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2}(1-\gamma^2). \end{aligned}

Here the equation is put into a form that is more easily evaluated by completing the square of the original equation. If both sides are divided by C2C^{2} the equation below is obtained,

1C2[(uμu)σuγσv(vμv)]2+(vμv)2C2σv2(1γ2)=1. \frac{1}{C^2}\left[\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} - \frac{\gamma}{\sigma_{v}}\left(v-\mu_v\right)\right]^{2} + \frac{\left(v-\mu_v\right)^2}{C^2\sigma_v^2}(1-\gamma^2) = 1.

This equation is equivalent the equation of a unit circle which satisfies the equation,

sin2θ+cos2θ=1. \sin^{2}{\theta} + \cos^2{\theta} = 1.

If θ\theta is assumed a parametric parameter satisfying 0θ2π0\geq \theta \geq 2\pi a parametric equation for the contours is obtained by making the following change of variables,

sinθ=1C[(uμu)σuγσv(vμv)]cosθ=(vμv)Cσv(1γ2). \begin{aligned} \sin{\theta} &= \frac{1}{C}\left[\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} - \frac{\gamma}{\sigma_{v}}\left(v-\mu_v\right)\right] \\ \cos{\theta} &= \frac{\left(v-\mu_v\right)}{C\sigma_v}\sqrt{(1-\gamma^2)}. \end{aligned}

Solving these equations for uu and vv yields the parametric equations,

u=Cσu[sinθ+γ1γ2cosθ]+μuv=Cσv1γ2cosθ+μv.     (16) \begin{aligned} u &= C\sigma_{u}\left[\sin{\theta} + \frac{\gamma}{\sqrt{1-\gamma^2}}\cos{\theta}\right] + \mu_{u} \\ v &= \frac{C\sigma_v}{\sqrt{1-\gamma^2}}\cos{\theta} + \mu_{v}. \end{aligned}\ \ \ \ \ (16)

To plot the actual contours a relation between the constant CC and the the value of the PDF along the contour is required. This relation is obtained by replacing the argument of the exponential in equation (12)(12) with C2C^{2},

K=12πσuσv1γ2eC22(1γ2), K=\frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{\frac{\footnotesize{-C^2}}{\footnotesize{2(1-\gamma^2)}}},

where KK is the PDF value along the contour. Solving this equation for CC gives,

C=[2(1γ2)ln(2Kπσuσv1γ2)]12. C = \left[-2\left(1-\gamma^2\right)\ln{\left(2K\pi\sigma_{u}\sigma_{v}\sqrt{1-\gamma^{2}}\right)}\right]^{\frac{\footnotesize{1}}{\footnotesize{2}}}.

If CC is assumed to be real the following constraint must be satisfied,

K < 12πσuσv1γ2, K\ <\ \frac{1}{2\pi\sigma_{u}\sigma_{v}\sqrt{1-\gamma^{2}}},

which places an upper bound on the value of the peak of the distribution. The following two plots of the parametric equations defined by (16)(16) should be compared to the contour plots from the previous section fo validation.

To get a sense of how the contour shape varies with the distribution parameters the following two plots scan a range of σv/σu\sigma_{v}/\sigma_{u} with γ=0\gamma = 0 to illustrate the limits σv/σu0\sigma_{v}/\sigma_{u}\to 0 and σv/σu\sigma_{v}/\sigma_{u}\to\infty. This result agrees with the expectation obtained in the previous analysis. The σv/σu0\sigma_{v}/\sigma_{u}\to 0 series of contours approaches the uu axis and the σv/σu\sigma_{v}/\sigma_{u}\to\infty contours approach the vv axis.

The next two plots illustrate the limit γ1\gamma\to 1 and γ1\gamma\to -1 with σv/σu=1\sigma_{v}/\sigma_{u}=1. The γ1\gamma\to 1 plot is converging to the line v=uv=u and the γ1\gamma\to -1 plot to the line v=uv=-u as described in the previous section. Note that as γ\gamma approaches the limit the semi-major axis of the contour increases along the appropriate limiting line without any rotation of the contour.

The final two plots also look at the γ1\gamma\to 1 limit but this time the first plot has σv/σu=0.5\sigma_{v}/\sigma_{u}=0.5 and the second σv/σu=2\sigma_{v}/\sigma_{u}=2. The first is converging to the line v=u/2v=u/2 and the second converges to the line v=2uv=2u in agreement with the previous analysis. The behavior of the contour as the limit is approached is more interesting since the ellipse has to rotate to reach the limit. This is caused by the γ=0\gamma=0 contour also being an ellipse which introduces and asymmetry that must be eliminated in the limit. If the γ=0\gamma=0 contour were a symmetric circle no initial asymmetry need to be erased.

Coordinate Transformation

In this section contours of constant uu and vv are plotted in the (x,y)(x, y) coordinate system using equation (6)(6) to understand how the transform distorts area elements as the distribution parameters are varied.

Contours of constant u=Cuu = C_{u} in (x,y)(x, y) coordinates are defined by,

x=1σu(Cuμu)y=11γ2[1σv(vμv)γσu(Cuμu))]. \begin{aligned} x &= \frac{1}{\sigma_{u}}\left(C_{u}-\mu_{u}\right) \\ y &= \frac{1}{\sqrt{1-\gamma^{2}}}\left[\frac{1}{\sigma_{v}}\left(v-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(C_{u}-\mu_{u)}\right)\right]. \end{aligned}

This set of equations defines contours which are lines of constant xx since the equation for xx is constant.

The contours of constant v=Cvv=C_{v} are defined by,

x=1σu(uμu)y=11γ2[1σv(Cvμv)γσu(uμu))]. \begin{aligned} x &= \frac{1}{\sigma_{u}}\left(u-\mu_{u}\right) \\ y &= \frac{1}{\sqrt{1-\gamma^{2}}}\left[\frac{1}{\sigma_{v}}\left(C_{v}-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(u-\mu_{u)}\right)\right]. \end{aligned}

Substituting the expression for xx into the expression for yy gives,

y=γ1γ2x+1σv1γ2(Cvμv), y = \frac{-\gamma}{\sqrt{1-\gamma^{2}}}x + \frac{1}{\sigma_{v}\sqrt{1-\gamma^{2}}}\left(C_{v}-\mu_{v}\right),

which is the equation of a line with slope γ/1γ2-\gamma/\sqrt{1-\gamma^{2}}.

First, consider the case γ=0\gamma=0, μu=μv=0\mu_{u}=\mu_{v}=0 and σu=σv=1\sigma_{u}=\sigma_{v}=1 which results in the transform,

x=Cuy=Cv, \begin{aligned} x &= C_{u} \\ y &= C_{v}, \end{aligned}

and the Jacobian, equation (10)(10), is given by,

J=1σuσv1γ2=1. \mid J \mid = \frac{1}{\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}} = 1.

It follows that area elements satisfy,

dxdy=Jdudv=dudv. dxdy = |J|dudv = dudv.

Thus, for this particular choice of transform parameters area elements are preserved. Inspection of the following plot confirms that this is the case since the transform exactly maps (u,v)(u,v) coordinates onto (x,y)(x,y) coordinates.

The next plot has parameters γ=0\gamma=0 and σu=1\sigma_{u}=1 and σv=2\sigma_{v}=2. The transform associated this parameter choice is given by,

x=Cuy=12Cv, \begin{aligned} x &= C_{u} \\ y &= \frac{1}{2}C_{v}, \end{aligned}

with Jacobian,

J=1σuσv1γ2=12. \mid J \mid = \frac{1}{\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}} = \frac{1}{2}.

Once again the uu contours are aligned with lines of constant xx but the vv contours are compressed by a factor of 22 relative to lines of constant yy. If follows that the transform reduces the size of (u,v)(u,v) area elements by a factor of 1/21/2 when transformed, namely,

dxdy=Jdudv=12dudv. dxdy = |J|dudv = \frac{1}{2}dudv.

What this is saying is that for an arbitrary area element dudvdudv in the (u,v)(u,v) coordinates when transformed to the (x,y)(x,y) coordinates the dxdydxdy element has 1/21/2 the area. This is seen to be the case in the plot where the area of a (u,v)(u,v) rectangle is reduced by a factor of 22.

The final plot considers the impact of correlation on the transform by using the parameters γ=0.5\gamma=0.5 and σu=σv=1\sigma_{u}=\sigma_{v}=1. The transform corresponding to this parameter choice is given by,

x=Cuy=(13x+23Cv), \begin{aligned} x &= C_{u} \\ y &= \left(-\frac{1}{\sqrt{3}}x + \frac{2}{\sqrt{3}}C_{v}\right), \end{aligned}

with Jacobian,

J=1σuσv1γ2=23. \mid J \mid = \frac{1}{\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}} = \frac{2}{\sqrt{3}}.

The uu contours are aligned again but now since there is correlation the vv contours are lines with a negative slope so rectangular area elements in (u,v)(u,v) are transformed into parallelograms in (x,y)(x, y) coordinates. The area of one of the parallelograms is ΔyΔx\Delta y\Delta x, where Δy\Delta y is the spacing between contours of constant vv and Δx\Delta x is the spacing between contours of constant uu. Now, from the transform above it is seen that,

Δx=ΔCuΔy=23ΔCv, \begin{aligned} \Delta x &= \Delta C_{u} \\ \Delta y &= \frac{2}{\sqrt{3}}\Delta C_{v}, \end{aligned}

but ΔCu=ΔCv=1\Delta C_{u}=\Delta C_{v} = 1, so the area of the parallelogram is given by,

ΔyΔx=23, \Delta y \Delta x = \frac{2}{\sqrt{3}},

which is equal to the Jacobian. It follows that for this choice of parameters the transformation area elements are increased in size by a factor of 2/32/\sqrt{3},

dxdy=Jdudv=23dudv. dxdy = |J|dudv = \frac{2}{\sqrt{3}}dudv.

Conclusions

The Bivariate Normal distribution provides a simple model of correlated random variables. It is interesting to study because it is modeled as a linear transform of independent Normal(0,1)\textbf{Normal}(0, 1) random variables that can serve as an introduction to the concepts used in transformations of multivariate integrals. Here the background needed to understand transformations of bivariate integrals was developed by starting with a discussion of the interpretation of the vector cross product as an area. This idea was then applied to the derivation of the Jacobian Matrix for an arbitrary bivariate transformation of differential area elements. Next, the transformation used to define the Bivariate Normal distribution was introduced and applied to a distribution of two independent Normal(0,1)\textbf{Normal}(0, 1) random variables. The Jacobian matrix was then computed and the Bivariate Normal PDF derived. A matrix form of the Bivariate Normal PDF based the covariance matrix was introduced and shown to be equivalent to the linear transform version first discussed. The covariance matrix form more easily extends to higher dimensions. The linear transform used to define the Bivariate Normal PDF introduced five parameters. It was next shown that these parameters are the means, μu, μv\mu_{u},\ \mu_{v}, variance, σu, σv\sigma_{u},\ \sigma_{v} and correlation coefficient, γ\gamma, of the distribution. The conditional distributions were also discussed. The change in shape of the distribution as the parameters were varied was then investigated by evaluating limits for the PDF contours that included γ0\gamma\to 0, γ±1\gamma\to \pm 1, σv/σu1\sigma_{v}/\sigma_{u}\to 1, σv/σu0\sigma_{v}/\sigma_{u}\to 0 and σv/σu\sigma_{v}/\sigma_{u}\to \infty. This was followed by numerically investigating the convergence to these limits using a parametric form of the PDF contour equation. The last topic discussed was the variation of the linear coordinate transformation with the distribution parameters. It was shown that resulting changes in transformed area elements were accounted for by the Jacobian.