Inverse CDF sampling is a method for obtaining samples from both discrete and continuous probability distributions that requires the CDF to be invertible. The method assumes values of the CDF are Uniform random variables on [0,1][0, 1]. CDF values are generated and used as input into the inverted CDF to obtain samples with the distribution defined by the CDF.

Discrete Distributions

A discrete probability distribution consisting of a finite set of NN probability values is defined by,

{p1, p2, , pN}     (1), \{p_1,\ p_2,\ \ldots,\ p_N\}\ \ \ \ \ (1),

with pi  0, ip_i\ \geq\ 0, \ \forall i and i=1Npi=1\sum_{i=1}^N{p_i} = 1. The CDF specifies the probability that ini \leq n and is given by,

P(i  n)=P(n)=i=1npi, P(i\ \leq\ n)=P(n)=\sum_{i=1}^n{p_i},

For a given CDF value, UU, The equation for P(n)P(n) can always be inverted by evaluating it for each nn and searching for the smallest value of nn that satisfies, P(n)  UP(n)\ \geq\ U. It follows that generated samples determined from UUniform(0,1)U \sim \textbf{Uniform}(0, 1) will have distribution (1)(1) since the intervals P(n)P(n1)=pnP(n)-P(n-1) = p_n are uniformly sampled.

Consider the distribution,

{112, 112, 16, 16, 112, 512}     (2) \left\{\frac{1}{12},\ \frac{1}{12},\ \frac{1}{6},\ \frac{1}{6},\ \frac{1}{12},\ \frac{5}{12} \right\} \ \ \ \ \ (2) It is shown in the following plot with its CDF. Note that the CDF is a monotonically increasing function.

A sampler using the Inverse CDF method with target distribution specified in (2)(2) implemented in Python is shown below.

import numpy

n = 10000
df = numpy.array([1/12, 1/12, 1/6, 1/6, 1/12, 5/12])
cdf = numpy.cumsum(df)

samples = [numpy.flatnonzero(cdf >= u)[0] for u in numpy.random.rand(n)]

The program first stores the CDF computed from the partial sums P(n)P(n) in an array. Next, CDF samples using UUniform(0,1)U \sim \textbf{Uniform}(0, 1) are generated. Finally, for each sampled CDF value, UU, the array containing P(n)P(n) is scanned for the smallest value of nn where P(n)  UP(n)\ \geq\ U. The resulting values of nn will have the target distribution (2)(2). This is shown in the figure below.

A measure of convergence of the samples to the target distribution can be obtained by comparing the cumulative moments of the distribution computed from the samples with the target value of the moment computed analytically. In general for a discrete distribution the first and second moments are given by,

μ=i=1Nipiσ2=i=1Ni2piμ2, \begin{aligned} \mu & = \sum_{i=1}^N ip_i\\ \sigma^2 & = \sum_{i=1}^N i^2p_i - \mu^2, \end{aligned}

In the following two plots the cumulative values of μ\mu and σ\sigma computed from the samples generated are compared with the target values using the equations above. The first shows the convergence of μ\mu and the second the convergence of σ\sigma. Within only 10310^3 samples both μ\mu and σ\sigma computed from samples is comparable to the target value and by 10410^4 the values are indistinguishable.

The number of operations required for generating samples using Inverse CDF sampling from a discrete distribution will scale O(NsamplesN)O(N_{samples}N) where NsamplesN_{samples} is the desired number of samples and NN is the number of terms in the discrete distribution.

It is also possible to directly sample distribution (2)(2) using the multinomial sampler from numpy,

import numpy

n = 10000
df = numpy.array([1/12, 1/12, 1/6, 1/6, 1/12, 5/12])
samples = numpy.random.multinomial(n, df, size=1)/n

Continuous Distributions

A continuous probability distribution is defined by the PDF, fX(x)f_X(x), where fX(x)0, xf_X(x) \geq 0,\ \forall x and fX(x)dx=1\int_{-\infty}^{\infty} f_X(x) dx = 1. The CDF is a monotonically increasing function that specifies the probability that X  xX\ \leq\ x, namely,

P(Xx)=FX(x)=xfX(w)dw. P(X \leq x) = F_X(x) = \int_{-\infty}^{x} f_X(w) dw.

Theory

To prove that Inverse CDF sampling works for continuous distributions it must be shown that,

P[FX1(U)  x]=FX(x)     (3), P[F_X^{-1}(U)\ \leq\ x] = F_X(x) \ \ \ \ \ (3),

where FX1(x)F_X^{-1}(x) is the inverse of FX(x)F_X(x) and UUniform(0,1)U \sim \textbf{Uniform}(0, 1).

A more general result needed to complete this proof is obtained using a change of variable on a CDF. If Y=G(X)Y=G(X) is a monotonically increasing invertible function of XX it will be shown that,

P(X  x)=P(Y  y)=P[G(X)  G(x)].     (4) P(X\ \leq\ x) = P(Y\ \leq\ y) = P[G(X)\ \leq\ G(x)]. \ \ \ \ \ (4)

To prove this note that G(x)G(x) is monotonically increasing so the ordering of values is preserved for all xx,

X  x  G(X)  G(x). X\ \leq\ x\ \implies\ G(X)\ \leq\ G(x).

Consequently, the order of the integration limits is maintained by the transformation. Further, since y=G(x)y=G(x) is invertible, x=G1(y)x = G^{-1}(y) and dx=(dG1/dy)dydx = (dG^{-1}/dy) dy, so

P(X  x)=xfX(w)dw=yfX(G1(z))dG1dzdz=yfY(z)dz=P(Y  y)=P[G(X)  G(x)], \begin{aligned} P(X\ \leq\ x) & = \int_{-\infty}^{x} f_X(w) dw \\ & = \int_{-\infty}^{y} f_X(G^{-1}(z)) \frac{dG^{-1}}{dz} dz \\ & = \int_{-\infty}^{y} f_Y(z) dz \\ & = P(Y\ \leq\ y) \\ & = P[G(X)\ \leq\ G(x)], \end{aligned}

where,

fY(y)=fX(G1(y))dG1dy f_Y(y) = f_X(G^{-1}(y)) \frac{dG^{-1}}{dy}

For completeness consider the case where Y=G(X)Y=G(X) is a monotonically decreasing invertible function of XX then,

X  x  G(X)  G(x), X\ \leq\ x\ \implies\ G(X)\ \geq\ G(x),

it follows that,

P(X x)=xfX(w)dw=yfX(G1(z))dG1dzdz=yfY(z)dz=P(Y  y)=P[G(X)  G(x)]=1P[G(X)  G(x)] \begin{aligned} P(X\ \leq x) & = \int_{-\infty}^{x} f_X(w) dw \\ & = \int_{y}^{\infty} f_X(G^{-1}(z)) \frac{dG^{-1}}{dz} dz \\ & = \int_{y}^{\infty} f_Y(z) dz \\ & = P(Y\ \geq\ y) \\ & = P[G(X)\ \geq\ G(x)] \\ & = 1 - P[G(X)\ \leq\ G(x)] \end{aligned}

The desired proof of equation (3)(3) follows from equation (4)(4) by noting that UUniform(0,1)U \sim \textbf{Uniform}(0, 1) so fU(u)=1f_U(u) = 1,

P[FX1(U)  x]=P[FX(FX1(U))  FX(x)]=P[U  FX(x)]=0FX(x)fU(w)dw=0FX(x)dw=FX(x). \begin{aligned} P[F_X^{-1}(U)\ \leq\ x] & = P[F_X(F_X^{-1}(U))\ \leq\ F_X(x)] \\ & = P[U\ \leq\ F_X(x)] \\ & = \int_{0}^{F_X(x)} f_U(w) dw \\ & = \int_{0}^{F_X(x)} dw \\ & = F_X(x). \end{aligned}

Example

Consider the Weibull Distribution. The PDF is given by,

fX(x;k,λ)={kλ(xλ)k1e(x/λ)kx  00x<0     (5) f_X(x; k, \lambda) = \begin{cases} \frac{k}{\lambda}\left(\frac{x}{\lambda} \right)^{k-1} e^{-\left(x/\lambda\right)^k} & x\ \geq\ 0 \\ 0 & x < 0 \end{cases} \ \ \ \ \ (5)

where k > 0k\ >\ 0 is the shape parameter and λ > 0\lambda\ >\ 0 the scale parameter. The CDF is given by,

FX(x;k,λ)={1e(xλ)kx  00x < 0. F_X(x; k, \lambda) = \begin{cases} 1-e^{\left(\frac{-x}{\lambda}\right)^k} & x\ \geq\ 0 \\ 0 & x\ <\ 0. \end{cases}

The CDF can be inverted to yield,

FX1(u;k,λ)={λln(11u)1k0  u 10u < 0 or u > 1. F_X^{-1}(u; k, \lambda) = \begin{cases} \lambda\ln\left(\frac{1}{1-u}\right)^{\frac{1}{k}} & 0\ \leq\ u\ \leq 1 \\ 0 & u\ <\ 0 \text{ or } u\ >\ 1. \end{cases}

In the example described here it will be assumed that k=5.0k=5.0 and λ=1.0\lambda=1.0. The following plot shows the PDF and CDF using these values.

The sampler implementation for the continuous case is simpler than for the discrete case. Just as in the discrete case CDF samples with distribution UUniform(0,1)U \sim \textbf{Uniform}(0, 1) are generated. The desired samples with the target Weibull\textbf{Weibull} distribution are then computed using the CDF inverse. Below an implementation of this procedure in Python is given.

import numpy

k = 5.0
λ = 1.0
nsamples = 100000

cdf_inv = lambda u: λ * (numpy.log(1.0/(1.0 - u)))**(1.0/k)
samples = [cdf_inv(u) for u in numpy.random.rand(nsamples)]

The following plot compares a histogram of the samples generated by the sampler above and the target distribution (5). The fit is quite good. The subtle asymmetry of the Weibull\textbf{Weibull} distribution is captured.

The first and second moments for the Weibull\textbf{Weibull} distribution are given by,

μ=λΓ(1+1k)σ2=λ2[Γ(1+2k)(Γ(1+1k))2], \begin{aligned} \mu & = \lambda\Gamma\left(1+\frac{1}{k}\right) \\ \sigma^2 & = \lambda^2\left[\Gamma\left(1+\frac{2}{k}\right)-\left(\Gamma\left(1+\frac{1}{k}\right)\right)^2\right], \end{aligned}

where Γ(x)\Gamma(x) is the Gamma function.

The following two plots compare the cumulative values of μ\mu and σ\sigma computed from sampling of the target distribution (5)(5) with the values from the equations above. The first shows the convergence of μ\mu and the second the convergence of σ\sigma. Within only 10310^3 samples both μ\mu and σ\sigma computed from samples are comparable to the target values and by 10410^4 the values are indistinguishable.

Performance

Any continuous distribution can be sampled by assuming that fX(x)f_X(x) can be approximated by the discrete distribution, {fX(xi)Δxi}N\left\{f_X(x_i)\Delta x_i \right\}_N for i=1,2,3,,Ni=1,2,3,\ldots,N, where Δxi=(xmaxxmin)/(N1)\Delta x_i=(x_{max}-x_{min})/(N-1) and xi=xmin+(i1)Δxix_i = x_{min}+(i-1)\Delta x_i. This method has disadvantages compared to using Inverse CDF sampling on the continuous distribution. First, a bounded range for the samples must be assumed when in general the range of the samples can be unbounded while the Inverse CDF method can sample an unbounded range. Second, the number of operations required for sampling a discrete distribution scales O(NsamplesN)O(N_{samples}N) but sampling the continuous distribution is O(Nsamples)O(N_{samples}).

Conclusions

Inverse CDF Sampling provides a method for obtaining samples from a known target distribution using a sampler with a Uniform(0,1)\textbf{Uniform}(0, 1) distribution that requires the target distribution be invertible. Algorithms for both the discrete and continuous cases were analytically proven to produce samples with distributions defined by the CDF. Example implementations of the algorithms for both distribution cases were developed. Samples produced by the algorithms for example target distributions were favorably compared. Mean and standard deviations computed from generated samples converged to the target distribution values in O(103)O(10^3) samples. The continuous sampling algorithm was shown to be more performant than the discrete version. The discrete version required O(NsamplesN)O(N_{samples}N) operations while the continuous version required O(Nsamples)O(N_{samples}) operations.