# Kullback-Leibler Divergence Between Two Dirichlet (and Beta) Distributions

By | September 10, 2013

Recently I’ve been working on learning parameters of a mixture of Dirichlet distributions, I needed a measure to check how good my algorithm works on synthetic data. I was advised to use Kullback-Leibler divergence, but its derivation was a little difficult. Here is the derivation:

Dirichlet distribution is a multivariate distribution with parameters $\alpha=[\alpha_1, \alpha_2, … , \alpha_K]$, with the following probability density function
$$p(x;\alpha) = \frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K x_k^{\alpha_k-1}$$
Kullback-Leibler divergence is defined as
$$KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx = \left < \log \frac{p(x)}{q(x)} \right>_{p(x)}$$
Let’s say we have two Dirichlet distributions $p$ and $q$, with parameters $\alpha$ and $\beta$ respectively. We write the KL divergence as
\begin{align}
KL(p||q) &= \left < \log \frac{p(x)}{q(x)} \right>_{p(x)} \\
&= \left < \log p(x) - \log q(x) \right>_{p(x)} \\
&= \left < \log \Gamma(\alpha_0) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum (\alpha_k-1) \log x_k \right . \\ & \quad \left . -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) - \sum (\beta_k-1) \log x_k \right >_{p(x)} \\
& = \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) \\
& \quad + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) \left<\log x_k \right>_{p(x)}
\end{align}
where $\alpha_0 = \sum_{k=1}^K \alpha_k$ and similarly $\beta_0 = \sum_{k=1}^K \beta_k$.

Here, the geometric mean $\left<\log x_k \right>_{p(x)}$ is equal to $\psi(\alpha_k)-\psi(\alpha_0)$, where $\psi$ is the digamma function. The details of calculating the geometric mean will be given below. Finally we have
\begin{align*}
KL(p||q) &= \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) (\psi(\alpha_k)-\psi(\alpha_0))
\end{align*}

### Matlab Code

The matlab code calculating the KL divergence is just a single expression. Given that alpha and beta are row vectors representing the two Dirichlet distribution parameters, the KL divergence is

D = gammaln(sum(alpha)) – gammaln(sum(beta)) – sum(gammaln(alpha)) + …
sum(gammaln(beta)) + (alpha – beta) * (psi(alpha) – psi(sum(alpha)))’;

You can also download it as a script here.

### Calculating the Geometric Mean

Since $\sum x = 1$, $x$ has $K-1$ degree of freedom. Therefore, we only need to take the integral over the first $K-1$ components of $x$:

\begin{align}
\left<\log x_k \right>_{p(x)} &= \int \log x_k \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \frac{\partial}{\partial \alpha_k} \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y2}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \int \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \left ( \frac{\prod_{j}^K \Gamma(\alpha_j)}{\Gamma(\alpha_0)} \right ) \label{y3} \\
&= \frac{\partial}{\partial \alpha_k} \log \left (\frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \right ) \label{y4}\\
&= \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_k) – \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_0)\\
&= \psi(\alpha_k)-\psi(\alpha_0)
\end{align}

We used the following properties:

• From \eqref{y1} to \eqref{y2}: $$\log a x^a = \frac{d}{d a} x^{a} \nonumber$$
• From \eqref{y3} to \eqref{y4}:: $$\frac{\Gamma'(x)}{\Gamma(x)} = \frac{d}{d x} \log \Gamma(x) = \psi(x) \nonumber$$
• The remaining steps are straightforward.

### Thanks

I thank my colleagues Deniz Akyildiz and Hakan Guldas for the “fruitful discussions” 🙂

I also thanks Wikipedia writers of the pages of Dirichlet Distribution, Beta Distribution, Beta, Gamma and Digamma functions 🙂

## One thought on “Kullback-Leibler Divergence Between Two Dirichlet (and Beta) Distributions”

This site uses Akismet to reduce spam. Learn how your comment data is processed.