Censored Gaussian Model

By | December 31, 2015

Recently I’ve started working with models including censored observations. In order to gain some insight to the problem, I’ve prepared a simple censored Gaussian model. Suppose we draw samples from a Gaussian distribution with unknown mean $\mu$ and precision $\tau$ $(= 1/\sigma^2)$ but can only observe the data points which are greater than a given threshold $T$. In this case, the usual closed-form parameter estimation equations for the Gaussian distribution will not work. Instead, we need to take into account the new observation model. The generative model for this scenario is given as

\begin{align}
\tau &\sim \mathcal{G}(\tau; \alpha, \beta) \\
\mu|\tau &\sim \mathcal{N}(\mu; \mu_0, 1/(n_0 \tau)) \\
x_i &\sim \mathcal{N}(x_i; \mu, \tau^{-1}) \quad \text{for } i \in [1, N] \\
y &= \{x_i | x_i > T\}
\end{align}

where $\mathcal{G}$ and $\mathcal{N}$ are the Gamma and Normal distributions respectivelty:

\begin{align}
\mathcal{N}(x; \mu, \tau^{-1}) &= \sqrt{ \frac{\tau}{2\pi}} \exp \left( -\frac{\tau}{2} (x-\mu)^2 \right) \\
\mathcal{G}(w; \alpha, \beta) &= \frac{\beta^\alpha}{\Gamma(\alpha)} w^{(\alpha-1)} \exp(-w \beta)
\end{align}

Let’s create an example:

Censored Gaussian Example

Censored Gaussian Example

Given all data points, $x_{1:N}$, the likelihood of $\mu$ and $\tau$ is
\begin{align}
p(x_{1:N}| \mu, \tau) \propto \tau^{N/2} \exp \left( -\frac{\tau}{2} \sum_{n=1}^N (x_n – \mu)^2\right)
\end{align}

Estimationg Gaussian Parameters

The posterior for $\mu$ and $\tau$ is proportional to:

\begin{align}
p(\mu, \tau | x_{1:N} ) \propto p(x_{1:N} | \mu, \tau) p(\mu | \tau) p(\tau)
\end{align}

this technical report or better Murphy’s report

\begin{align}
\tau | x_{1:N} &\sim \mathcal{G} \left( \alpha + \frac{N}{2},\quad \beta + \frac{1}{2} \sum_{i=1}^N (x_i – \bar{x})^2 + \frac{N n_o}{2(N+n_0)}(\bar{x} – \mu_0)^2 \right) \\
\mu | \tau, x_{1:N} &\sim \mathcal{N} \left( \frac{N}{N+n_0} \bar{x} + \frac{n_0}{N+n_0} \mu_0,\quad (N+n_0)\tau \right)
\end{align}
where $\bar{x} = \sum_i x_i / N$ is the sample mean.

But, if we use the above formula for $p(\mu,\tau|y)$, we won’t be able to estimate the parameters of the censored Gaussian:

Gaussian Estimation for Censored Gaussian

Gaussian Estimation for Censored Gaussian

Parameter estimation for Censored Gaussian Distribution

Our problem is to find the posterior distribution of $\{\mu, \tau\}$ pair given the observations $y_{1:M}$. First, we need to write down the probability of a

The probability of observing $y_i$ is zero for $y_i T]}
\end{align}

We can calculate the normalizing constant as
\begin{align}
\int_{y_i=-\infty}^\infty p(y_i;\mu, \tau) &= 1 \\
\frac{1}{Z} \int_{y_i=T}^\infty \mathcal{N}(y_i; \mu, \tau) &= 1 \\
\frac{1}{Z} \left[ 1-\int_{y_i=-\infty}^T \mathcal{N}(y_i; \mu, \tau) \right]&= 1 \\
\frac{1}{Z} \left[ 1- \frac{1}{2} \left[ 1 – erf \left( -\mu\sqrt{\frac{\tau}{2}} \right) \right] \right]&= 1 \\
Z &= \frac{1}{2} \left[erf \left( -\mu\sqrt{\frac{\tau}{2}} \right) -1 \right]
\end{align}

Posterior Distribution for $\{\mu, \tau\}$

We can calculate the posterior distribution for the model parameters as follows
\begin{align}
p(\mu, \tau | y_{1:M}) &\propto p(\mu) p(\tau)p(y_{1:M} | \mu, \tau) \\
&\propto p(\mu) p(\tau)\prod_{i=1}^M p(y_i | \mu, \tau)
\end{align}

We can maximize this posterior to get a point estimate for the model parameters. The closed form solution for this maximization is difficult due to the Gaussian Error Function (erf) in the formula, but we can draw samples from this posterior distribution via Metropolis-Hastings algorithm.

A Metropolis Hastings Algorithm

In order to sample $\{\mu, \tau\}$ pairs from the posterior $p(\mu, \tau | y_{1:M})$, we introduce the following proposal

\begin{align}
q(\mu’, \tau’| \mu, \tau) = q_1(\tau’| \tau) q_2(\mu’| \mu,\tau’)
\end{align}

such that

\begin{align}
q_1(\tau’| \tau) &= \mathcal{G}(\tau’; \tau, a_0 \beta) \\
q_2(\mu’| \mu,\tau’ ) &= \mathcal{N}(\mu’; \mu, b_0 \tau’)
\end{align}

Here we propose $\tau’$ from a Gamma distribution whose expectation is $\tau$. The variance of $\tau’$ is controlled by $a_0 \beta$. The larger $a_0$ results in longer jumps. Similarly $\mu’$ is drawn from a Gaussian distribution centered at $\mu$. Here, the variance is contolled by $b_0 \tau’$. Again, larger $b_0$ results in longer jumps. We calculate the acceptance ratio as follows:

\begin{align}
a &= \frac{p(\mu’, \tau’ | y_{1:M}) q(\mu, \tau | \mu’, \tau’) }{p(\mu, \tau | y_{1:M}) q(\mu’, \tau’ | \mu, \tau)}
\end{align}

We get the following result:

Estimating Censored Gaussian

Estimating Censored Gaussian

Posterior Distribution for Number of Unobserved Observations

The observations $y_{1:M}$ is generated by sampling from a Gaussian with parameters $\mu, \tau$ and accepting the drawn sample if it’s above the threshold. This looks like a rejection sampling for $p(y_i|\mu, \tau)$ using a Gaussian proposal $\mathcal(y_i;\mu, \tau)$. The latent $x$ vector contains both the accepted and rejected samples. We can estimate the total number of proposed particles $N$, i.e. length of $x$, depending on the parameters ${\mu, \tau}$ and the number of accepted samples $M$.

We call the event that $x_i > T$ as a *aceept event* and $a_{\mu,\tau}$ as the proabaility of a succes event under the parameters $\mu$ and $\tau$, which is calculated as

\begin{align}
a_{\mu,\tau} := P(x>T) = 1 – P(x\leq T) = \frac{1}{2} \left[ 1 – erf \left( -\mu\sqrt{\frac{\tau}{2}} \right) \right]
\end{align}

For given $M \leq N$, we know that the number of success events are $M$ qith probability $a_{\mu,\tau}$, and the number of reject events are $N-M$ with probability $1-a_{\mu,\tau}$. Then, we can write for $N$ trials the probabiltiy of $M$ accepts is distributed by the given Binomial

\begin{align}
p(M|N,a_{\mu,\tau}) = \left( \begin{array}{l} N \\ M \end{array} \right) a_{\mu,\tau}^{M} (1-a_{\mu,\tau})^{N-M}
\end{align}

When we know the number of accept events $M$, the number of total trials are distributed as

\begin{align}
p(N|M,a_{\mu,\tau}) \propto p(M|N,a_{\mu,\tau}) p(N)
\end{align}

if we take $P(N)$ flat, we have $p(N|M,a_{\mu,\tau}) \propto p(M|N,a_{\mu,\tau})$.

Posterior Distribution of N

Posterior Distribution of N

You can download this blog post as an IPython Notebook which includes a Python implementation (requires Numpy, Scipy and Matplotlib).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.