Expectation of a Gaussian Likelihood Function

Introduction

I’m currently working on a machine learning problem, and I’ve run into a situation where I need to find the expectation of the Gaussian likelihood, rather than the maximum. To clarify, not the expected value of (which is just the mean of the distribution) but the expectation of the likelihood function itself.

This problem really puzzled me, especially considering my limited abilities with pure math. Nonetheless, it was an interesting exercise, and I didn’t encounter many relevant resources. Therefore, I’m documenting this as a reference for anyone else who stumbles onto the problem.

Fair warning: this article contains a lot of equations which might be boring to some. If you prefer, feel free to skip directly to the Python code solutions (1 & 2).

This article tackles two variations on the problem.

  1. The first section derives the expectation of the Gaussian likelihood function. Given a random variable X \sim \mathcal{N}(\mu_1, \sigma_1^2) , with pdf p(X), what is E[p(X)].
  2. The second case is more complicated, showing the derivation of the expectation of the likelihood of a Gaussian, given the parameters of another. Many thanks to stack exchange user Cryo who kindly provided the derivation after I got stuck.

Why calculate the expected likelihood?

Before we jump into the derivation, why would you want to calculate the expected value of the likelihood?

Mostly, you wouldn’t. In normal scenarios the Maximum Likelihood Estimate (MLE) is an unbiased estimator1, that is the expected value of the estimated parameter converges to the true value as the sample size increases.

However, there are certain times when that’s not the case.

Consider sampling a number of Gaussian random walks from point A to point B. The MLE corresponds to the shortest, direct path from A to B. Clearly, all other paths will be longer than the MLE and therefore the MLE does not represent the length of the average path. That’s a biased estimator.

Therefore, in this case (and others) we would like to know the expected likelihood rather than the maximum.

Expectation of the Gaussian Likelihood

To begin with, start with the definition of the pdf of X, which is:

p(X) = \int_{-\infty}^\infty \frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}} \ dx2

also, the definition of the expectation is:

E[x] = \int_{-\infty}^\infty x \cdot f(x) \ dx3

where

x = p(X)

therefore:

E[p(X)] = \int_{-\infty}^\infty p(X) \cdot p(X) \ dx

E[p(X)] = \int_{-\infty}^\infty \left(\frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(X-\mu_1)^2}{2\sigma_1^2}}\right) \cdot \left(\frac{1}{\sqrt{2\pi}\sigma_1} e^{-\frac{(X-\mu_1)^2}{2\sigma_1^2}}\right) \ dx

simplifying and rearranging gives:

E[p(X)] = \frac{1}{\sigma_1^2 \cdot 2 \pi} \cdot \int_{-\infty}^\infty \exp\left(-\frac{(x - \mu_1)^2}{\sigma_1^2}\right) \ dx

where the integral on the right is the Gaussian integral4. That leaves us with the following result:

E[p(X)] = \frac{1}{\sigma_1^2 \cdot 2 \pi} \cdot {\sigma_1 \sqrt{\pi}}

This can be implemented in Python code as follows

Solution 1 Code

import numpy as np
# Define the expected likelihood function
def expected_likelihood(u, sd):
  '''Expectation of the Gaussian Likelihood'''
  a = np.sqrt(np.pi)
  b = sd*2*np.pi
  return a/b
# Define a Gaussian likelihood Function for comparison
def gaussian_likelihood(x, u, sd):
  # Constant term
  c = 1/(sd*np.sqrt(2*np.pi))
  ll = c*np.exp( -0.5 * ((x - u)/sd)**2)
  return ll
# Sample 10000 draws from X ~N(0,2), calculate the likelihood and take the mean
X = np.random.normal(0,2, 10000)
print(np.mean(gaussian_likelihood(X, 0, 2)))
# Compare to the analytical, expected likelihood
print(expected_likelihood(0, 2))

Expectation of px(Z)

Now to the more complicated case. What is expectation of the likelihood of one Gaussian, given the parameters of another?

Precisely, given two independent Gaussian distributions.

X \sim \mathcal{N}(\mu_1, \sigma_1^2)

Z \sim \mathcal{N}(\mu_2, \sigma_2^2)

where the values of \mu_1, \sigma_1, \mu_2 and \sigma_2 are known.

Suppose I were to randomly draw values from Z and evaluate the likelihood using the pdf of X, p(X) with \mu_1, \sigma_1, what is the expected likelihood?

Thanks again to stats.stackexchange user Cryo for finding the analytical solution. For reference, here is the thread on stack exchange 5. Admittedly, I became stuck working on this problem so I’d like to thank them for their insights.

Using the definition of the expectation and replacing x for our distribution Z, we want to evaluate:

E_z[p_x(z)] = \frac{1}{\sigma_2\sqrt{2\pi}} \int \exp\left(-\frac{(z-\mu_2)}{2\sigma_2^2}\right) p_x(z) dz

= \frac{1}{\sigma_2\sqrt{2\pi}} \int \exp\left(-\frac{(z-\mu_2)^2}{2\sigma_2^2}\right) \frac{1}{\sigma_1\sqrt{2\pi}} \exp\left(-\frac{(z-\mu_1)^2}{2\sigma_1^2}\right) dz

Which can be rewritten as:

\alpha^2 = \frac{1}{2\sigma_2^2} + \frac{1}{2\sigma_1^2}

\alpha\beta = \frac{\mu_2}{2\sigma_2^2} + \frac{\mu_1}{2\sigma_1^2}

\gamma = \frac{\mu_2^2}{2\sigma_2^2} + \frac{\mu_1^2}{2\sigma_1^2}

E_z[p_x(z)] = \frac{1}{\sigma_1\sigma_2\cdot 2\pi}\int \exp\left(-\left(\alpha^2 z^2 - 2\alpha\beta z +\gamma\right)\right) dz

Completing the square:

E_z[p_x(z)] = \frac{1}{\sigma_1\sigma_2\cdot 2\pi} \cdot \exp\left(-\gamma+\beta^2\right) \int \exp\left(-\left(\alpha z - \beta\right)^2\right) dz

= \frac{1}{\sigma_1\sigma_2\cdot 2\pi} \cdot \exp\left(-\gamma+\beta^2\right) \cdot \frac{1}{\alpha} \int \exp\left(-\zeta^2\right) d\zeta

And finally, the last integral is a standard one reducing to \sqrt{\pi}6

This leaves us with the final equation:

E_z[p_x(z)] = \frac{1}{2\sigma_1\sigma_2\alpha\cdot \sqrt{\pi}}\cdot\exp\left(-\gamma+\beta^2\right)

The code for this equation is given below and is easy to verify empirically.

Solution 2 Code

def px_z(u1, u2, sd1, sd2):
  ''' Find the likelihood of Z ~N(u2, sd2**2), given the parameters of X ~ (u1, sd1**2) '''
  # Define the constants
  alpha = np.sqrt(1/(2*sd2**2) + 1/(2*sd1**2))
  alpha_beta = u2/(2*sd2**2) + u1/(2*sd1**2)
  gamma = (u2**2)/(2*sd2**2) + (u1**2)/(2*sd1**2)
  beta = alpha_beta/alpha
  # Define the components
  p1 = 1/(2*sd1*sd2*alpha*np.sqrt(np.pi))
  p2 = np.exp(-gamma + beta**2)
  return p1*p2

Conclusion

All in all, an interesting exercise and, if you are stumbling across this article in need of a solution, hopefully useful.

Finally, another reminder to myself to continue to learning the math. I’ve always been more of a practitioner than a theoretician but there is a deep link between machine learning and math. Everything is easier and ones understanding is deeper if you can follow the derivations.

Did you enjoy this article? Would you like to read more? Continue to the rest of my articles.

References

  1. https://en.wikipedia.org/wiki/Bias_of_an_estimator ↩︎
  2. https://en.wikipedia.org/wiki/Normal_distribution ↩︎
  3. https://en.wikipedia.org/wiki/Expected_value ↩︎
  4. https://en.wikipedia.org/wiki/Gaussian_integral ↩︎
  5. https://stats.stackexchange.com/questions/633428/expectation-of-the-gaussian-likelihood ↩︎
  6. https://en.wikipedia.org/wiki/Gaussian_integral ↩︎

Leave a comment