The Bivariate Normal Distribution is an often used multivariable distribution because it provides a simple model of correlated random variables. Here it is derived by application of a linear transformation and a multivariate change of variables to the distribution of two independent unit normal, $\textbf{Normal}(0,\ 1)$, random variables. To provide background a general expression for change of variables of a bivariate integral is discussed and then used to obtain the Bivariate Normal Distribution. The Marginal and Conditional distributions are next computed and used to evaluate the first and seconds moments, correlation coefficient and conditional expectation and conditional variance. Finally, the variation in the shape of the distribution and transformation as the distribution parameters are varied is discussed.

## Bivariate Change of Variables

Consider the PDF of a single variable, $f(x)$, and the transformation, $x=x(y)$ which is assumed monotonically increasing. The PDF of the transformed variable is given by,

$g(y) = f(x(y)) \frac{dx}{dy}.$

This result follows by performing a change of variables to the CDF,

\begin{aligned} P(X\ \leq\ x) &= \int_{-\infty}^{x} f(w) dw \\ &= \int_{-\infty}^{y} f(x(w)) \frac{dx}{dw} dw \\ &= P(Y\ \leq\ y) \end{aligned}

$dx = \frac{dx}{dy} dy.$

The $dx/dy$ term scales $dy$ appropriately to conserve the differential length. In two dimensions a similar but more complicated thing happens.

Consider the bivariate PDF, $f(x,y)$, with CDF,

$P(\{X,Y\} \in A) = \int_{A} f(x, y) dA,\ \ \ \ \ (1)$

which defines an integration over a region computing the probability that $X$ and $Y$ are both in the region $A$. The figure below provides an illustration for a Cartesian Coordinate System where $dA=dxdy$.

To go further a geometric result relating the Cross Product of two vectors to the area of the parallelogram enclosed by the vectors is needed. This topic is discussed in the following section.

### Cross Product as Area

Consider two vectors $\textbf{A}$ and $\textbf{B}$ separated by an angle $\theta$ and rotated by and angle $\phi$ as shown in the following figure.

The vector components projected along the $\textbf{x}$ and $\textbf{y}$ unit vectors is given by,

\begin{aligned} \textbf{A} &= A_{x}\textbf{x} + A_{y}\textbf{y} \\ \textbf{B} &= B_{x}\textbf{x} + B_{y}\textbf{y}, \end{aligned}

where,

\begin{aligned} A_{x} &= \mid \textbf{A} \mid \cos{\phi} \\ A_{y} &= \mid \textbf{A} \mid \sin{\phi} \\ B_{x} &= \mid \textbf{B} \mid \cos{\left(\phi + \theta\right)} \\ B_{y} &= \mid \textbf{B} \mid \sin{\left(\phi + \theta\right)}. \\ \end{aligned}\ \ \ \ \ (2)

The cross product of two vectors is another vector perpendicular to both vectors. For the figure above, that direction is perpendicular to the plane of the page, call it $\textbf{z}$. The cross product is then defined by the determinate computed from the components of the two vectors and projected along $\textbf{z}$,

\begin{aligned} \textbf{A} \times \textbf{B} &= \begin{vmatrix} A_{x} & A_{y} \\ B_{x} & B_{y} \end{vmatrix}\textbf{z} \\ &= \left( A_{x}B_{y} - A_{y} B_{x} \right)\textbf{z}. \end{aligned}\ \ \ \ \ (3)

Substituting equation $(2)$ into $(3)$ and making use of,

$\begin{gathered} \sin{\left(\theta + \phi\right)} = \sin{\theta}\ \cos{\phi}\ +\ \cos{\theta}\ \sin{\phi} \\ \cos{\left(\theta + \phi\right)} = \cos{\theta}\ \cos{\phi}\ -\ \sin{\theta}\ \sin{\phi} \\ \sin^2{\phi}\ +\ \cos^2{\phi} = 1, \end{gathered}$

results in,

\begin{aligned} \mid \textbf{A} \times \textbf{B} \mid &= \mid A_{x} B_{y} - B_{x} A_{y} \mid \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\left[\cos{\phi}\ \sin{\left(\phi + \theta\right)} - \sin{\phi}\ \cos{\left(\phi + \theta\right)}\right] \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\left(\cos^{2}{\phi}\ \sin{\theta} + \sin{\phi}\ \cos{\phi}\ \cos{\theta} - \sin{\phi}\ \cos{\phi}\ \cos{\theta} + \sin^2{\phi}\ \sin{\theta}\right) \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid\sin{\theta}\left(\cos^{2}{\phi}\ + \sin^2{\phi}\ \right) \\ &= \mid \textbf{A} \mid \mid \textbf{B} \mid \sin{\theta}, \end{aligned}

which is the area of the parallelogram indicated by orange in the figure above. $\sin\theta$ can be assumed positive since $\theta$ can always be chosen such that $0\ \leq\ \theta\ \leq\ \pi$.

### Bivariate Jacobian

Consider the following coordinate transformation between two coordinate systems defined by the variables $\left(x,y\right)$ and $\left(u,v\right)$,

\begin{aligned} x &= x(u,v) \\ y &= y(u,v). \end{aligned}\ \ \ \ \ (4)

applied to the integral,

$\int_{A} f(x, y) dxdy,$

where $A$ is an arbitrary area in $\left(\textit{x}, \textit{y}\right)$ coordinates. The following figure shows how area elements transform between $\left(\textit{u}, \textit{v} \right)$ coordinates and $\left(\textit{x}, \textit{y} \right)$ coordinates when equation $(4)$ is applied. The left side of the figure shows the $\left(\textit{u}, \textit{v} \right)$ Cartesian Coordinate System. In this system vertical lines of constant $\textit{u}$ are colored orange and horizontal lines of constant $\textit{v}$ blue. The right side of the figure illustrates how equation $(4)$ maps lines of constant $u$ and $v$ onto $\left(x,y\right)$ coordinates. The lines of constant $\textit{u}$ and $\textit{v}$ can be curved and not aligned with the $\left(\textit{x}, \textit{y}\right)$ Cartesian coordinates as shown in the figure. The transformed $\left(\textit{u}, \textit{v} \right)$ coordinates in this case are Curvilinear.

Consider the differential area elements indicated by orange in the figure. In the $\left(\textit{u}, \textit{v} \right)$ Cartesian coordinates the differential area element is given by $dA=dudv$ but in $\left(\textit{x}, \textit{y}\right)$ coordinates differentials defining the area are not orthogonal so the element is distorted. In the infinitesimal limit it will become a parallelogram with area given by the cross product of $d\textbf{X}$ and $d\textbf{Y}$ which are tangent vectors to the curves of constant $\textit{v}$ and $\textit{u}$ respectively at the origin point of the vectors. To compute the cross product expressions for the differentials are required. The components of the differentials are computed from transformation $(4)$ by assuming $\textit{v}$ is constant for $d\textbf{X}$ and $\textit{u}$ is constant for $d\textbf{Y}$. It follows that,

\begin{aligned} d\textbf{X} &= \frac{\partial x}{\partial u}du\ \textbf{x} + \frac{\partial y}{\partial u}du\ \textbf{y} \\ d\textbf{Y} &= \frac{\partial x}{\partial v}dv\ \textbf{x} + \frac{\partial y}{\partial v}dv\ \textbf{y}. \end{aligned}

The cross product of the differentials above is given by,

\begin{aligned} d\textbf{X} \times d\textbf{Y} &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}}\small{du} & \frac{\small{\partial y}}{\small{\partial u}}\small{du} \\ \frac{\small{\partial x}}{\small{\partial v}}\small{dv} & \frac{\small{\partial y}}{\small{\partial v}}\small{dv} \end{vmatrix}\ \textbf{z} \\ &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}} & \frac{\small{\partial y}}{\small{\partial u}} \\ \frac{\small{\partial x}}{\small{\partial v}} & \frac{\small{\partial y}}{\small{\partial v}} \end{vmatrix}dudv\ \textbf{z} \\ &= \left(\frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v} \right)dudv\ \textbf{z}. \end{aligned}

The bivariate Jacobian is defined by,

\begin{aligned} \mid J \mid &= \left| \frac{\partial\left(x, y\right)}{\partial\left(u, v\right)} \right| \\ &= \begin{vmatrix} \frac{\small{\partial x}}{\small{\partial u}} & \frac{\small{\partial x}}{\small{\partial v}} \\ \frac{\small{\partial y}}{\small{\partial u}} & \frac{\small{\partial y}}{\small{\partial v}} \end{vmatrix} \\ &= \left(\frac{\partial x}{\partial u}\frac{\partial y}{\partial v} - \frac{\partial y}{\partial u}\frac{\partial x}{\partial v} \right). \end{aligned}

In general the determinate of a matrix equals the determinate of the transpose of the matrix. It follows that the area element in $\left(\textit{u}, \textit{v} \right)$ coordinates is given by,

$dA = \mid d\textbf{X} \times d\textbf{Y} \mid = \mid J \mid dudv.$

$\mid J \mid$ will scale the differential area $dudv$ by the amount required by the transform to conserve the area in a manner similar to that seen for a single variable length is conserved.

Finally the transform of the bivariate integral in equation $(1)$ is given by,

$\int_{A} f(x, y) dxdy = \int_{A'} f(x(u,v), y(u, v)) \mid J \mid dudv,$

where $A'$ is the region obtained by applying $(4)$ to $A$.

## Bivariate Normal Distribution

The Bivariate Normal Distribution with random variables $U$ and $V$ is obtained by applying the following linear transformation to two independent $\textbf{Normal}(0, 1)$ random variables $X$ and $Y,$

$\begin{pmatrix} U - \mu_{u} \\ V - \mu_{v} \end{pmatrix} = \begin{pmatrix} \sigma_{u} & 0 \\ \sigma_{v}\gamma & \sigma_{v}\sqrt{1 - \gamma^{2}} \end{pmatrix} \begin{pmatrix} X \\ Y \end{pmatrix},\ \ \ \ \ (5)$

where $\mu_{u}$, $\mu_{v}$, $\sigma_{u}$, $\sigma_{v}$ and $\gamma$ are scalar constants named in anticipation of being the mean, standard deviation and correlation coefficient of the distribution.

An equation for $X$ and $Y$ in terms $U$ and $V$ is obtained by inverting the transformation matrix,

$\begin{pmatrix} X \\ Y \end{pmatrix} = \frac{1}{\sigma_{u}\sigma_{v}\sqrt{1 - \gamma^2}} \begin{pmatrix} \sigma_{v}\sqrt{1 - \gamma^{2}} & 0 \\ -\sigma_{v}\gamma & \sigma_{u} \end{pmatrix} \begin{pmatrix} U - \mu_{u} \\ V - \mu_{v} \end{pmatrix},\ \ \ \ (6)$

where it is assumed that $\gamma^{2} \ne 1.$ The distribution for $X$ and $Y$ is found by making use of the assumption that they are independent $\textbf{Normal}(0, 1)$, namely,

$f(x,y) = f(x)f(y) = \frac{1}{2\pi}e^{-(x^2 + y^2) / 2}.\ \ \ \ \ (7)$

In the previous section it was shown that for transformation ,

\begin{aligned} x &= x(u, v) \\ y &= y(u, v), \end{aligned}

the transformed PDF is given by,

$g(u, v) = \mid J \mid f(x(u,v), y(u,v)),\ \ \ \ \ (8)$

where $\mid J \mid$ is the Jacobian. The transformations $x(u,v)$ and $y(u,v)$ can be determined from equation $(6)$,

\begin{aligned} x(u,v) & = \frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \\ y(u,v) & = \frac{1}{\sqrt{1-\gamma^2}} \left[ \frac{1}{\sigma_{v}}\left(v-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(u-\mu_{u}\right) \right]. \end{aligned}\ \ \ \ \ (9)

It follows that the Jacobian is given by,

$\mid J \mid = \frac{1}{\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}.\ \ \ \ \ (10)$

With the goal of keeping things simple in the evaluation of equation $(8)$ the exponential argument term of equation $(7)$, $x^2 + y^2,$ will first be considered. Use of the expressions for $x(u,v)$ and $y(u,v)$ from equation $(9)$ gives,

\begin{aligned} x^2 + y^2 &= \left[x(u,v)\right]^{^2} + \left[y(u,v)\right]^{^2} \\ &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{1}{1-\gamma^2}\left[\frac{1}{\sigma_{v}}\left(v-\mu_{v}\right) - \frac{\gamma}{\sigma_{u}}\left(u-\mu_{u}\right)\right]^2 \\ &= \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\gamma^2}{1-\gamma^2}\frac{\left(u - \mu_u \right)^2}{\sigma_u^2} + \frac{1}{1-\gamma^2}\frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - \frac{2\gamma}{1-\gamma^2}\frac{\left(u - \mu_u \right)\left(v-\mu_v\right)}{\sigma_u\sigma_v} \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2}\left(1-\gamma^2+\gamma^2\right) \frac{\left(v-\mu_v\right)^2}{\sigma_{v}^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \right] \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}} \right]. \end{aligned}\ \ \ \ \ (11)

The first step expands the right most squared term and the last two steps aggregate $\left(u-\mu_{u} \right)$ and $\left(v-\mu_v\right)$ terms. The Bivariate Normal PDF follows by inserting equations $(10)$ and $(11)$ into equation $(8)$,

\begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } \end{aligned}\ \ \ \ \ (12)

Extension of transform $(5)$ to more than two dimensions is not clear. The following section will describe a form of the transformation that makes this more apparent.

### Matrix Form of Bivariate Normal Distribution

A matrix form for the Bivariate Normal Distribution PDF can be constructed that scales to higher dimensions. Here the version for two variables is compared with the results of the previous section and followed by a discussion of extension to an arbitrary number of dimensions. To begin consider the column vectors,

$\begin{gathered} Y = \begin{pmatrix} y_1 \\ y_2 \end{pmatrix} \mu = \begin{pmatrix} \mu_1 \\ \mu_2 \end{pmatrix} \\ P = \begin{pmatrix} \sigma_{1}^{2} & \gamma\sigma_{1}\sigma_{2} \\ \gamma\sigma_{1}\sigma_{2} & \sigma_{2}^{2} \end{pmatrix}. \\ \end{gathered}\ \ \ \ \ (13)$

$Y$ is the column vector of Bivariate Normal Random variables, $\mu$ the column vector of mean values and $P$ is called the Covariance Matrix. To continue the inverse of the covariance matrix and its determinate are required. The inverse is given by,

$P^{-1} = \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2} & -\gamma\sigma_{1}\sigma_{2} \\ -\gamma\sigma_{1}\sigma_{2} & \sigma_{1}^{2} \end{pmatrix},$

and the determinate is,

$\mid P \mid = \sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right).$

Next, consider the product, where $\left( Y - \mu\right)^{T}$ is the transpose of $\left( Y - \mu\right)$,

$\left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right).$

The two rightmost terms evaluate to,

\begin{aligned} P^{-1}\left(Y-\mu\right) &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2} & -\gamma\sigma_{1}\sigma_{2} \\ -\gamma\sigma_{1}\sigma_{2} & \sigma_{1}^{2} \end{pmatrix} \begin{pmatrix} y_1 -\mu_1 \\ y_2 -\mu_2 \end{pmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} \sigma_{2}^{2}\left(y_1 -\mu_1\right) - \gamma\sigma_{1}\sigma_{2}\left(y_2 -\mu_2\right) \\ -\gamma\sigma_{1}\sigma_{2}\left(y_1 -\mu_1\right) + \sigma_{1}^{2}\left(y_2 -\mu_2\right) \end{pmatrix}. \end{aligned}

Continuing the evaluation by including the left most term gives,

\begin{aligned} \left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right) &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \begin{pmatrix} y_1 -\mu_1 & y_2 -\mu_2 \end{pmatrix} \begin{pmatrix} \sigma_{2}^{2}\left(y_1 -\mu_1\right) - \gamma\sigma_{1}\sigma_{2}\left(y_2 -\mu_2\right) \\ -\gamma\sigma_{1}\sigma_{2}\left(y_1 -\mu_1\right) + \sigma_{1}^{2}\left(y_2 -\mu_2\right) \end{pmatrix} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \left\{ \left[\sigma_{2}^{2}\left(y_1 -\mu_{1}\right) - \gamma\sigma_{1}\sigma_{2}\left(y_{2} -\mu_{2}\right)\right] \left(y_{1}-\mu_{1} \right) + \left[\sigma_{1}^{2}\left(y_{2} -\mu_{2}\right) -\gamma\sigma_{1}\sigma_{2}\left(y_{1} -\mu_{1}\right)\right] \left(y_{2}-\mu_{2} \right) \right\} \\ &= \frac{1}{\sigma_1^2\sigma_2^2\left(1 - \gamma^2 \right)} \left[ \sigma_{2}^{2}\left(y_{1} -\mu_{1}\right)^{2} + \sigma_{1}^{2}\left(y_{2} -\mu_{2}\right)^{2} - 2\gamma\sigma_{1}\sigma_{2}\left(y_{2} -\mu_{2}\right)\left(y_{1}-\mu_{1} \right) \right] \\ &= \frac{1}{1-\gamma^2} \left[ \frac{\left(y_{1}-\mu_{1} \right)^2}{\sigma_{1}^2} + \frac{\left(y_{2}-\mu_{2}\right)^2}{\sigma_{2}^2} - 2\gamma\frac{\left(y_{1}-\mu_{1}\right)}{\sigma_{1}}\frac{\left(y_{2}-\mu_{2}\right)}{\sigma_{2}} \right]. \end{aligned}

Which is the same as obtained using transform $(9)$ in the previous section. Comparing the expression for $\mid P\mid$ with equation $(12)$ gives,

$\mid J \mid = \frac{1}{\sqrt{\mid P \mid}}.$

Putting things together produces the desired result,

\begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{2\pi\sqrt{\mid P \mid}} e^{ \left[-\frac{1}{2}\footnotesize{\left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right)}\right] }. \end{aligned}

To extend to an arbitrary number of dimensions, $n$, the covariance matrix defined in equation $(13)$ needs to written in a more general form,

$P_{ij} = \text{Cov}(Y_{i}, Y_{j}).$

After updating the normalization factor to account for the product of $n$ $\textbf{Normal(0,1)}$ distributions the Multivariate Normal PDF becomes,

\begin{aligned} g(u, v) &= \mid J \mid f(x(u,v), y(u,v)) \\ &= \frac{1}{(2\pi)^{n/2}\sqrt{\mid P \mid}} e^{ \left[-\frac{1}{2}\footnotesize{\left( Y - \mu\right)^{T}P^{-1}\left(Y-\mu\right)}\right] } \end{aligned}.

To determine the linear transformation, analogous to equation $(9)$, $P_{ij}$ is factored using Cholesky Decomposition.

In a following section it will be shown that for two variables,

$\text{Cov}(Y_{i}, Y_{j}) = \begin{cases} \gamma\sigma_{i}\sigma_{j} & i\ \neq\ j \\ \sigma_{i}^{2} & i = j \end{cases}.$

which is equivalent to equation $(13)$.

## Bivariate Normal Distribution Properties

The previous sections discussed the derivation of the Bivariate Normal random variables as linear combinations of independent unit normal random variables. Linear combinations were constructed by application of a linear transformation that includes five independent parameters. In this section interpretations of the parameters are provided and variation in the distribution as the parameters are varied is described. First, the Marginal Distributions are calculated and it is shown that four of the distribution parameters correspond to marginal distribution means and standard deviations. Next, the Correlation Coefficient of the distribution is computed and shown to correspond to the remaining parameter. In remaining sections how changes in the parameters affect the distribution shape are considered. This includes an analysis of PDF contours and the linear transformation used to construct the distribution.

### Marginal Distributions

The Marginal Distributions for $u$ and $v$ are defined by,

\begin{aligned} g(u) &= \int_{-\infty}^{\infty} g(u, v) dv \\ g(v) &= \int_{-\infty}^{\infty} g(u, v) du, \end{aligned}

where $g(u,v)$ is defined by equation $(12).$ First, consider evaluation of the integral for $g(u),$

\begin{aligned} g(u) &= \int_{-\infty}^{\infty} g(u, v) dv \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}\int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{\sigma_{v}^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)}}{\footnotesize{\sigma_{v}}} - \gamma\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right]^{2} -\gamma^{2}\frac{\footnotesize{\left(u-\mu_u\right)^2}}{\footnotesize{\sigma_u^2}} \right\} }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ -\frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } e^{ \frac{\footnotesize{\gamma^2}}{\footnotesize{2(1-\gamma^2)}} \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} } \int_{-\infty}^{\infty}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_{v} \right)}}{\footnotesize{\sigma_{v}}} - \gamma\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right]^{2} }dv\\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ -\frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{2\sigma_{u}^2}} } \sqrt{\footnotesize{2\pi\sigma_{v}^{2}(1-\gamma^2)}}\\ &= \frac{1}{\sqrt{2\pi\sigma_{u}^{2}}}e^{ -\frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{2\sigma_{u}^2}} }. \end{aligned}

In the first step the $u$ dependence is factored out for evaluation of the integral over $v$. The next step completes the square of the exponential followed by also factoring the introduced $u$ term from the $v$ integral. This is followed by simplification of the $u$ exponential argument. Finally, the integral over $v$ is evaluated yielding a $\textbf{Normal}(\mu_{u}, \sigma_{u})$ distribution. Similarly, the marginal distribution $g(v)$ is given by,

$g(v) = \int_{-\infty}^{\infty} g(u, v) du = \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}}e^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} },\ \ \ \ \ (14)$

which is a $\textbf{Normal}(\mu_{v}, \sigma_{v})$ distribution. The plot below shows examples of $\textbf{Normal}(\mu, \sigma)$ as $\mu$ and $\sigma$ are varied. The effect of $\mu$ is to translate the distribution along the $u$ axis and $\sigma$ scales the distribution width.

The mean and variance are now readily determined for both $u$ and $v$,

\begin{aligned} \text{E}[U] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} ug(u, v) dvdu \\ &= \int_{-\infty}^{\infty}u\int_{-\infty}^{\infty} g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}ug(u) dvdu \\ &= \mu_{u}, \end{aligned}

and,

\begin{aligned} \text{Var}[U] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} (u -\mu_{u})^2 g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}(u -\mu_{u})^2 \int_{-\infty}^{\infty} g(u, v) dvdu \\ &= \int_{-\infty}^{\infty}(u -\mu_{u})^2g(u) dvdu \\ &= \sigma_{u}^{2}. \end{aligned}

Similarly for $v$,

\begin{aligned} \text{E}[V] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} vg(u, v) dudv = \mu_{v} \\ \text{Var}[V] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} (v -\mu_{v})^2 g(u, v) dudv = \sigma_{v}^{2}. \end{aligned}

### Conditional Distribution

The Conditional Distribution of a random variable is the distribution obtained by assuming the values of one or more correlated random variables are known. If the variables are uncorrelated, independent, the conditional distribution is equivalent to the marginal distribution. The conditional distribution is useful in the of the calculation of Correlation Coefficient performed in the following section and in simulation methods such as Gibbs Sampling.

For the bivariate case the conditional distributions are defined by,

\begin{aligned} g(u|v) &= \frac{g(u, v)}{g(v)} \\ g(v|u) &= \frac{g(u, v)}{g(u)} \end{aligned}

Using equations $(12)$ and $(14),$ $g(u|v)$ is evaluated as follows,

\begin{aligned} g(u|v) &= \frac{g(u, v)}{g(v)} \\ &= \frac{1}{2\pi\sigma_{u}\sigma_{v} \sqrt{1-\gamma^{2}}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} + \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } \sqrt{2\pi\sigma_{v}^{2}}e^{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(u-\mu_{u} \right)^2}}{\footnotesize{\sigma_{u}^2}} - 2\gamma\frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_{v}}}\frac{\footnotesize{\left(u-\mu_{u} \right)}}{\footnotesize{\sigma_{u}}} \right] } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} + \frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_v\right)^2}}{\footnotesize{\sigma_v^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \frac{1}{\footnotesize{\sigma_{u}}} \left[ u - \left(\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right) \right]^2 - \frac{\footnotesize{\gamma^2\sigma_v^2}}{\footnotesize{\sigma_v^2}} \left(\footnotesize{v-\mu_{v}})\right)^2 \right\} } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} - \frac{\footnotesize{1}}{\footnotesize{2(1-\gamma^2)}} \left[ \frac{\footnotesize{\left(v-\mu_v\right)}}{\footnotesize{\sigma_v^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2(1-\gamma^2)}} \left\{ \frac{1}{\footnotesize{\sigma_{u}}} \left[ u - \left(\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right) \right]^2 \right\} } e^{ \left\{ \frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{\footnotesize{2\sigma_{v}^2}} \left[ \footnotesize{1} - \frac{\footnotesize{1-\gamma^2}}{\footnotesize{1-\gamma^2}} \right] \right\} }\\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u - \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 }. \end{aligned}\ \ \ \ \ (15)

In the first step the $\left(v-\mu_{v}\right)^2$ terms are collected and then the square is completed for the remaining terms. Once again the $\left(v-\mu_{v}\right)^2$ terms are collected and the finial result obtained, a $\textbf{Normal}$ distribution.

Similarly, $g(v|u)$ is given by,

\begin{aligned} g(v|u) &= \frac{g(u, v)}{g(u)} \\ &= \frac{1}{\sigma_{v}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ v - \left[\mu_v + \frac{\footnotesize{\gamma\sigma_v}}{\footnotesize{\sigma_u}} \left(\footnotesize{u-\mu_{u}}\right)\right] \right\}^2 }. \end{aligned}

The conditional mean and variance can now be easily evaluated,

\begin{aligned} \text{E}[U|V] &= \int_{-\infty}^{\infty} ug(u|v) du \\ &=\left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right], \end{aligned}

\begin{aligned} \text{E}[V|U] &= \int_{-\infty}^{\infty} vg(v|u) dv \\ &= \left[\mu_v + \frac{\footnotesize{\gamma\sigma_v}}{\footnotesize{\sigma_u}} \left(\footnotesize{u-\mu_{u}}\right)\right], \end{aligned}

\begin{aligned} \text{Var}[U|V] &= \text{E}[\left(U - \text{E}[U|V]\right)^{2}] \\ &= \int_{-\infty}^{\infty} \left(u - \text{E}[U|V]\right)^{2}g(u|v) du \\ &= \sigma_{u}^{2}(1-\gamma^2), \end{aligned}

\begin{aligned} \text{Var}[V|U] &= \text{E}[\left(V - \text{E}[V|U]\right)^{2}] \\ &= \int_{-\infty}^{\infty} \left(v - \text{E}[V|U]\right)^{2}g(v|u) dv \\ &=\sigma_{v}^{2}(1-\gamma^2). \end{aligned}

For both $u$ and $v$ the conditional expectation is a linear function of the conditioned variable. This is illustrated in the following plot where $g(u|v)$ is plotted for several values of $v$. It is seen that $v$ translates the distribution along the $u$ axis.

Consider the variation of $\text{E}[U|V]$, $\text{E}[V|U]$, $\text{Var}[U|V]$ and $\text{Var}[U|V]$ with $\gamma$. For $\gamma = 0$ each reduces to the values corresponding to its marginal distribution discussed in the previous section. For both conditional distributions increasing $\gamma$ leads to decreasing variance resulting in a sharper peak for the distribution. Additionally, changing the sign of $\gamma$ causes reflection of the distribution about the mean. Each of these properties is illustrated on the plot below where $g(u|v)$ is plotted for values of $\gamma$ ranging from $-1$ to $1.$

### Correlation Coefficient

The Cross Correlation of the two random variables $U$ and $V$ is defined by,

\begin{aligned} \text{E}[UV] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u,v) dvdu \\ &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u|v)g(v) dvdu. \end{aligned}

The last step follows from the definition of conditional probability. Now, substituting equations $(15)$ and $(14)$ into the equation above yields,

\begin{aligned} \text{E}[UV] &= \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} uv g(u|v)g(v) dvdu \\ &=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}uv \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}}e^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u - \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 } \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}}e^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} }dvdu \\ &= \frac{1}{\sigma_{u}\sqrt{2\pi\left(1-\gamma^{2}\right)}} \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}} \int_{-\infty}^{\infty}ve^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} } \int_{-\infty}^{\infty}ue^{ \frac{\footnotesize{-1}}{\footnotesize{2\sigma_{u}^{2}(1-\gamma^2)}} \left\{ u + \left[\mu_u - \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] \right\}^2 }dudv\\ &= \frac{1}{\sqrt{2\pi\sigma_{v}^{2}}} \int_{-\infty}^{\infty}ve^{ -\frac{\footnotesize{\left(v-\mu_{v} \right)^2}}{2\footnotesize{\sigma_{v}^2}} } \left[\mu_u + \frac{\footnotesize{\gamma\sigma_u}}{\footnotesize{\sigma_v}} \left(\footnotesize{v-\mu_{v}}\right)\right] dv\\ &=\mu_{u}\mu_{v} + \frac{\gamma\sigma_{u}}{\sigma_{v}}\sigma_{v}^{2} \\ &=\mu_{u}\mu_{v} + \gamma\sigma_{u}\sigma_{v}. \end{aligned}

It follows that the Covariance is given by,

$\text{Cov}(U,V) = \text{E}[UV] - \text{E}[U]E[V] = \gamma\sigma_{v}\sigma_{u},$

and Correlation Coefficient by,

$\gamma = \frac{\text{Cov}(U,V)}{\sigma_{v}\sigma_{u}}.$

### Distribution Parameter Limits

In previous sections it was shown that the free parameters used in the definition of the Bivariate Normal distribution, equation $(12)$, are its the mean, variance and correlation coefficient. Here a sketch of the change in shape of the distribution as these parameters are varied is discussed by evaluating limits of the parameters for the equation defining the PDF contours. The following section will describe the convergence to the limits using a numerical analysis.

The equation satisfied by the contours is obtained by setting the argument of the exponential in equation $(12)$ to a constant, $C^{2}$,

$C^{2} = \frac{\left(u-\mu_{u} \right)^2}{\sigma_{u}^2} + \frac{\left(v-\mu_v\right)^2}{\sigma_v^2} - 2\gamma\frac{\left(v-\mu_v\right)}{\sigma_{v}}\frac{\left(u-\mu_{u} \right)}{\sigma_{u}},$

where$C^{2}$ is related to the value of the contour. To develop an expectation of the behavior the following limits of this equation are considered, $\gamma\to 0$, $\gamma\to \pm 1$, $\sigma_{v}/\sigma_{u}\to 1$, $\sigma_{v}/\sigma_{u}\to 0$ and $\sigma_{v}/\sigma_{u}\to \infty$. The variation with $\mu_{u}$ and $\mu_{v}$ produces a translation which is not as interesting.

First, consider the limit $\gamma\to 0$ which is easily evaluated using the equation above,

$C^{2} = \frac{\left(u-\mu_{u} \right)^{2}}{\sigma_{u}^{2}} + \frac{\left(v-\mu_v\right)^2}{\sigma_{v}^{2}}.$