36 Lecture 35, April 05, 2024

Last lecture, we started introducing the last concept of the course, the moment generating function (MGF). Today, we will keep looking the properties of the MGF, and extend MGF to the multivariate case (i.e. MGF for 2 or more random variables).

36.1 Properties of MGF

The following shows why this is called a moment generating function

\(M_X(0)=1\):
So long as \(M_X(t)\) is defined in a neighbourhood of \(t=0\), \[ \frac{d}{dt^k}M_X(0) = \mathbb{E}[X^k] \]
So long as \(M_X(t)\) is defined in a neighbourhood of \(t=0\), by Taylor \[M_X(t) = \sum_{j=0}^{\infty} \frac{t^j\mathbb{E}[X^j]}{j!} \]

Example 36.1 (MGF of Poisson) Let \(X \sim Poi(\lambda)\). Show that the MGF of \(X\) is given by \[ M_X(t)= e^{\lambda(e^t-1)},\quad t\in\mathbb{R},\] and use the MGF to show that \[ \mathbb{E}[X] = \lambda = \mathbb{V}ar(X). \]

Solution:

The mgf is computed using the exponential sum as \[\begin{align*} M_X(t) &= E(e^{tX})=\sum_{x=0}^\infty e^{tx} e^{-\lambda} \frac{\lambda^x}{x!}\\ &=e^{-\lambda} \sum_{x=0}^\infty \frac{ (e^t \lambda)^x}{x!}\\ &= e^{-\lambda} e^{e^t \lambda}=e^{\lambda(e^t-1)} \end{align*}\] and we get \[M_X'(t)=e^{\lambda(e^t-1)} \lambda e^t \Rightarrow E(X)=M_X'(0)=\lambda\] and similarly \[ E(X^2)=M_X''(0)=\lambda^2+\lambda\] so that \[ Var(X) = E(X^2)-E(X)^2=\lambda^2+\lambda-\lambda^2=\lambda\]

The significance of MGF is that, under certain conditions, it can show equivalence of two distributions:

Theorem 36.1 (Continuity theorem) If \(X\) and \(Y\) have MGFs \(M_X(t)\) and \(M_Y(t)\) defined in neighbourhoods of the origin, and satisfying \(M_X(t) = M_Y(t)\) for all \(t\) where they are defined, then \[ X \stackrel{D}{=} Y. \]

36.1.0.1 Poisson to binomial approximation

Recall: For large \(n\) and small \(p\) such that \(\mu=np\), we can approximate the \(Bin(n,p)\) distribution with a \(Poi(\mu)\) distribution.

We can prove this by showing that the mgf of \(Bin(n,p)\) converges to the mgf of \(Poi(\mu)\) as \(n\rightarrow\infty\) where \(\mu=np\):

\[\begin{align*} M_{Bin(n,p)}(t) &= (pe^t + 1-p)^n\\ &= (1+p(e^t-1))^n\\ &= \left(1+\frac{\mu}{n}(e^t-1)\right)^n\\ &\rightarrow e^{\mu(e^t-1)}=M_{Poi(\mu)}(t),\quad n\rightarrow\infty \end{align*}\]

36.1.0.2 Multivariate Moment Generating Functions

Definition 36.1 The joint moment generating function of two random variables, \(X\) and \(Y\), is

\[M(s,t) = E(e^{sX+tY}).\]

And so if \(X\) and \(Y\) are independent

\[M(s,t) = E(e^{sX})E(e^{tY}).\]

Theorem 36.2 Suppose that \(X\) and \(Y\) are independent and each have moment generating functions \(M_X(t)\) and \(M_Y(t)\). Then the moment generating function of \(X+Y\) is

\[ M_{X+Y}(t) = E\left(e^{t(X+Y)}\right) = E\left(e^{tX}\right)E\left(e^{tY}\right) = M_X(t) M_Y(t). \] In general, for more than 2 random variables, the MGF of the sum is the product of the individual MGFs.

36.1.0.3 Comments

The result from the previous slide can be used to prove the Central Limit Theorem (a simpler form of it) !!!
It can also be used to show that
- the sum of independent Poissons is Poisson;
- the sum of independent Binomials with the same \(p\) is again Binomial;
- the sum of independent Normals is again normal;
- etc

Example 36.2 Use mgfs to show that if \(X\sim Poi(\lambda)\) is independent of \(Y\sim Poi(\mu)\), then \(X+Y\sim Poi(\lambda+\mu)\). \

Solution:

We know that \(M_X(t)=e^{\lambda(e^t-1)}\) and \(M_Y(t)=e^{\mu(e^t-1)}\).
Since \(X\) and \(Y\) are independent, the mgf of \(X+Y\) is \[\begin{align*} M_{X+Y}(t) &= M_X(t) M_Y(t)\\ &= e^{\lambda(e^t-1} e^{\mu(e^t-1)}\\ &= e^{(\lambda+\mu)(e^t-1)} \end{align*}\]
This is the mgf of \(Poi(\lambda+\mu)\).
By uniqueness of the mgf, we conclude \(X+Y\sim Poi(\lambda+\mu)\).

36.2 Chapter 999 Additional Topics (will not be on the exam unless otherwise specified)

36.2.1 Distributions that do not have moments

Cauchy distribution: All the distributions we have seen in the course have well-defined moments. However, it is not always the case.

Definition 36.2 (Cauchy distribution) The Cauchy distribution is a symmetric, bell-shaped distribution on \((-\infty,\infty)\) with PDF \[ f_X(x) = \frac{1}{\pi} \frac{1}{1+(x-\theta)^2},\quad -\infty <x<\infty ,\quad -\infty<\theta < \infty. \]

But \[\mathbb{E}[|X|]= \int_{-\infty}^\infty \frac{1}{\pi}\frac{|x|}{1+(x-\theta)^2}dx = \infty\], which is the neccessary condition for the expectation to exist. Also, if the first moment does not exist, the higher moments also do not exist!

Conclusion: No moments exist!!

Comment: in this case, we do not have \(\mathbb{E}[X]\), and MGF is essentially a special form of the LOTUS \(\mathbb{E}[g(X)]\), where \(g(X)=\exp(X t)\). Does it exist in this case?? If not, what do we do? Can we still use the MGF? The answers are NO!!!

If this is the case, can we use other ways to characterize these kind of distribution?

36.2.2 Other generating functions

In the course, we saw the moment generating function. But there exists other generating functions.

36.2.2.1 Factorial Moment Generating function (FMGF)

\[ G_X(t):= \mathbb{E}[ t^X ], \] provided the expectation exists for all \(t\) in some interval of the form \(1-h<t<1+h\).\

The relationship between the FMGF and MGF is as follows: \[ G_X(t) = \mathbb{E}[t^X] = E[e^{X\ln(t)}] = M_X\{\ln(t)\} \]

FMGF can be used to calculate: \[ \frac{d^r}{dt^r} G_X(t)|_{t=1} = \mathbb{E}[X(X-1)\cdots (X-r+1)]. \] In particular we have

What if we want to have the central moments, say \(\mathbb{E}[X^2]\)?

\[\begin{align*} \mathbb{E}[X^2] &= \mathbb{E}[X] + \mathbb{E}\left\{X(X-1)\right\} \\ &= \mathbb{E}[X] + G_X^{\prime\prime}(1). \end{align*}\]

36.2.2.2 Characteristic function

Characteristic function: \[ \phi_X(t) = \mathbb{E}[e^{itX}], \] where \(i\) is the complex number \(\sqrt{-1}\). Note that this definition requires a complex integration. It ALWAYS EXISTS and completely determines the distribution, that is, every (C)DF has a unique characteristic function. \[ \phi_X(-it) = M_X(t). \] E.g. if \(X\sim N(0,1),~Y\sim U(-1,1)\) and \(Z\) follows a Cauchy distribution, then we have \(\phi_X(t) = \exp(-t^2/2)\), \(\phi_Y(t)=\sin(t)/t\) and \(\phi_Z(t)=\exp(-|t|)\)

36.2.2.3 Cumulant generating function (CGF):

\[ \ln\{M_X(t)\} \] This is used to generated the cumulants of \(X\), which is defined as the coefficients of the Taylor series of the CGF.

36.2.3 Delta Method

What else can we do if we know a random variable follows a normal distribution? YES!!!

Theorem 36.3 (Delta method) Let \(Y\) be a random variable that satisfies \(\sqrt{n}(Y-\theta)\to \mathcal{N}(0,\sigma^2)\). For a given function \(g(\cdot)\) and a specific value of \(\theta\), suppose that \(g^\prime(\theta)\) exist and is not 0. Then \[ \sqrt{n}(g(Y)-g(\theta)\ \to \mathcal{N}(0,\sigma^2[g^\prime(\theta)]^2). \]

The proof is using the 2nd order Taylor expansion.

Example 36.3 (Example of the usage of delta method) Suppose that \(X_1,\cdots,X_N\sim F\) are i.i.d. with a common mean \(\mu\) and a finite variance.

Consider a linear combination that \(\bar{X}_N := N^{-1}\sum_{i=1}^NX_i\). What is the distribution of \(\sqrt{N}\left\{(\bar{X}_N)^{-1}-\mu^{-1}\right\}\)?

Since we know from the CLT that \[ \sqrt{N}(\bar{X}_N - \mu) \to \mathcal{N}(0,\mathbb{V}ar(X_1)) \] Then, applying the delta method, we have \[ \sqrt{N}(\bar{X}_N^{-1} - \mu^{-1}) \to \mathcal{N}\left\{0, \mu^{-4} \mathbb{V}ar(X_1)\right\}. \]

36.2.4 Other interesting results

36.2.4.1 Taylor expansion

Taylor expansion/approximate/series (power series) is a very important tools in many fields, including statistics. We can use Taylor series to prove many results including the CLT, delta method, approximating the moments, and many more.

Definition 36.3 ((One dimensional) Taylor expansion) If a function \(h(x)\) has derivative up to order \(r\), that is \(\frac{d^r}{dx^r}h(x)|_{x=a} :=h^{(r)}(x)\) exists, then for any constant \(a\in\mathbb{R}\), the Taylor polynomial of order \(r\) about \(a\) is given as \[ T_r(x):= \sum_{j=1}^r \frac{h^{(j)}(a)}{j!}(x-a)^j. \]

If we use the Taylor series to make approximate on \(h(x)\), one may wonder what is the error of this approximate.

Definition 36.4 (Remainder/Error of Taylor expansion) The remainder/error of the Taylor expansion on the function \(h(x)\) with order \(r\) is given by \[ R(x) := h(x) - T_r(x) = \int_a^x \frac{h^{(r+1)}(t)}{r!}(x-t)^r dt. \]

We have another theorem that is useful in the discussion

Theorem 36.4 If \(h^{(r)}(a) =\frac{d^r}{dx^r}h(x)|_{x=a}\) exists, then \[ \lim_{x\to a} \frac{h(x)-T_r(x)}{(x-a)^r} =0. \]

Most of the statistical application only uses the first or the second order approximation. We may show how to use those approximations to estimate the mean and the variance.

Example 36.4 (Approximate of mean and variance using first/second order Taylor series) Assume \(X\) is a random variable with the first moment not equal to zero, i.e. \(\mathbb{E}[X] \ne 0\). Then if we do the Taylor expansion, we have \[ h(X) = h(\mu) + h^\prime(\mu)(X-\mu). \] If \(h(X)\) is an estimator of \(h(\mu)\) (i.e. when the sample size is large, \(h(X)\) is approximately equals to \(h(\mu)\)), then we have \[\begin{align*} E[h(X)] &\approx E[h(\mu)] + h^\prime(\mu) \mathbb{E}[X-\mu]\\ &= h(\mu) + h^\prime (\mu) \{\mathbb{E}X - \mu\}\\ &=h(\mu) + h^\prime (\mu) \{\mu - \mu\}\\ &=h(\mu). \end{align*}\] Thus, we can approximate the expectation of \(g(X)\) using \(g(\mu)\).

Similar for the variance \(\mathbb{V}ar(g(X))\), we can use \(g^{\prime \prime}(\mu)\) to approximate. This two approximate lead us to the delta method we have seen above.

36.2.4.2 Proof for CLT

First we recall/restate the CLT

Theorem 36.5 (CLT revisit) Let \(X_1,\cdots,X_n\) be a sequence of i.i.d. random variables with the existence of MGF in a neighbour of \(0\) (i.e. \(M_{X_i}(t)\) exists for \(|t|<h\) for an arbitrary \(h>0\). Let \(\mathbb{E}X_i= \mu<\infty\) and \(\mathbb{V}ar(X_i) = \sigma^2 <\infty\). Denote the mean be \(\bar{X}_n=n^{-1}\sum_{i=1}^n X_i\). Let \(G_n(x)\) be the CDF of \(Z_n:= \sqrt{n}(\bar{X}_n-\mu)/\sigma\). Then for any \(x\in(-\infty,\infty)\), \[ \lim_{n\to \infty} G_n(x) = \int_{-\infty}^x \frac{1}{\sqrt{2\pi}}\exp(\frac{-y^2}{2})dy, \] which means \(Z_n\to \mathcal{N}(0,1)\) as \(n\to \infty\).

Proof (Proof for CLT). Disclaimer: This proof is based on Casella and Berger (2002).

First, define the standardized random variables \(Y_i:= (X-\mu)/\sigma\) with common MGF \(M_Y(t)\) that exists for \(|t|<\sigma h\). We can further write \[ \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma} = \frac{1}{\sqrt{n}}\sum_{i=1}^nY_i. \] Using the uniqueness of MGF, we know that \[\begin{align*} M_{Z_n}(t) &= M_{\sqrt{n}(\bar{X}_n - \mu)/\sigma}(t) = M_{\frac{1}{\sqrt{n}}\sum_{i=1}^nY_i}(t)\\ &=M_{\sum_{i=1}^n Y_i} (\frac{t}{\sqrt{n}})\\ &=\left\{ M_Y\left(\frac{t}{\sqrt{n}}\right)\right\}^n. \end{align*}\] We will then do the Taylor expansion on \(M_Y\left(\frac{t}{\sqrt{n}}\right)\) around \(0\). Using the Taylor series, we have \[ M_Y\left(\frac{t}{\sqrt{n}}\right) = \sum_{j=0}^\infty \frac{d^j}{dt^j}M_Y(t)|_{t=0}\frac{(t/\sqrt{n})^j}{j!}, \] where it exists when \(t< \sqrt{n}\sigma h\). Since we know the mean and variance for the standard normal is \(0\) and \(1\) respectively, and the 0-th moment is \(1\), we have \(\frac{d^0}{dt^0}M_Y(t)|_{t=0} = 1\), \(\frac{d}{dt}M_Y(t)|_{t=0} = 0\) and \(\frac{d^2}{dt^2}M_Y(t)|_{t=0} = 1\). Using those facts, we can write out the Taylor series as \[ M_Y(\frac{t}{\sqrt{n}})= 1 + \frac{(t/\sqrt{n})^2}{2!} + R_Y(\frac{t}{\sqrt{n}}), \] where \(R_Y(\cdot)\) is the remainder/approximation error such that \[ R_Y(\frac{t}{\sqrt{n}}) = \sum_{j=1}\frac{d^j}{dt^j}M_Y(t)|_{t=0} \cdot \frac{(t/\sqrt{n})^j}{j!}. \] By Theorem 36.4, we have, for \(t\ne0\), \[ \lim_{n\to \infty} \frac{R_Y(t/\sqrt{n})}{(t/\sqrt{n})^2}=0. \] For other terms, we also may also elminiate them as \[ \lim_{n\to \infty}\frac{R_Y(t/\sqrt{n})}{(1/\sqrt{n})^2} = \lim_{n\to \infty}R_y(\frac{t}{\sqrt{n}}) = 0, \] and for \(t=0\), we have \(R_Y(0/\sqrt{n})=0\). Thus, for any fixed \(t\), we can write the Taylor expansion of the MGF as \[\begin{align*} \lim_{n\to \infty} \left\{ M_Y\left(\frac{t}{\sqrt{n}}\right)\right\}^n &= \lim_{n\to\infty}\left\{1+\frac{(t/\sqrt{n})^2}{2!} + R_Y(\frac{t}{\sqrt{n}})\right\}^n\\ &=\lim_{n\to\infty}\left\{1+\frac{1}{n}\left(\frac{t^2}{2} + nR_Y(\frac{t}{\sqrt{n}})\right)\right\}^n\\ &= \exp\left(\frac{t^2}{2}\right), \end{align*}\] where the last equality is from the identity of exponential function. By the uniqueness of the MGF, and the fact that \(\exp\left(\frac{t^2}{2}\right)\) is the MGF of the standard normal distribution, we conclude that we \(Z_n= \sqrt{n}(\bar{X}_n-\mu)/\sigma\) follows a standard normal distribution \(\mathcal{N}(0,1)\). This concludes the proof.

Reference:

Casella, G., and Berger, R.L. (2002). Statistical lnference. Duxbury Press.

TODO:

I wlll add something during the weekend, if not on Monday so stay tune!

Things I plan to add

Result of Taylor expansion, review and its usage in proving the Delta method
The proof for CLT, this also involves the Taylor expansion