Dirichlet distribution is a multivariate distribution with parameters $\alpha=[\alpha_1, \alpha_2, … , \alpha_K]$, with the following probability density function

$$p(x;\alpha) = \frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K x_k^{\alpha_k-1}$$

Kullback-Leibler divergence is defined as

$$KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx = \left < \log \frac{p(x)}{q(x)} \right>_{p(x)}$$

Let’s say we have two Dirichlet distributions \(p\) and \(q\), with parameters \(\alpha\) and \(\beta\) respectively. We write the KL divergence as

\begin{align}

KL(p||q) &= \left < \log \frac{p(x)}{q(x)} \right>_{p(x)} \\

&= \left < \log p(x) - \log q(x) \right>_{p(x)} \\

&= \left < \log \Gamma(\alpha_0) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum (\alpha_k-1) \log x_k \right . \\
& \quad \left . -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) - \sum (\beta_k-1) \log x_k \right >_{p(x)} \\

& = \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) \\

& \quad + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) \left<\log x_k \right>_{p(x)}

\end{align}

where \(\alpha_0 = \sum_{k=1}^K \alpha_k \) and similarly \(\beta_0 = \sum_{k=1}^K \beta_k \).

Here, the geometric mean \(\left<\log x_k \right>_{p(x)}\) is equal to \(\psi(\alpha_k)-\psi(\alpha_0)\), where \(\psi\) is the digamma function. The details of calculating the geometric mean will be given below. Finally we have

\begin{align*}

KL(p||q) &= \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) (\psi(\alpha_k)-\psi(\alpha_0))

\end{align*}

### Matlab Code

The matlab code calculating the KL divergence is just a single expression. Given that *alpha* and *beta* are row vectors representing the two Dirichlet distribution parameters, the KL divergence is

D = gammaln(sum(alpha)) – gammaln(sum(beta)) – sum(gammaln(alpha)) + …

sum(gammaln(beta)) + (alpha – beta) * (psi(alpha) – psi(sum(alpha)))’;

You can also download it as a script here.

### Calculating the Geometric Mean

Since \(\sum x = 1\), \(x\) has \(K-1\) degree of freedom. Therefore, we only need to take the integral over the first \(K-1\) components of \(x\):

\begin{align}

\left<\log x_k \right>_{p(x)} &= \int \log x_k \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\

&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y1}\\

&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \frac{\partial}{\partial \alpha_k} \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y2}\\

&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \int \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\

&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \left ( \frac{\prod_{j}^K \Gamma(\alpha_j)}{\Gamma(\alpha_0)} \right ) \label{y3} \\

&= \frac{\partial}{\partial \alpha_k} \log \left (\frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \right ) \label{y4}\\

&= \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_k) – \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_0)\\

&= \psi(\alpha_k)-\psi(\alpha_0)

\end{align}

We used the following properties:

The remaining steps are straightforward.

### Thanks

I thank my colleagues Deniz Akyildiz and Hakan Guldas for the “fruitful discussions” 🙂

I also thanks Wikipedia writers of the pages of Dirichlet Distribution, Beta Distribution, Beta, Gamma and Digamma functions 🙂

Thanks for this. It is exactly what I needed.