Dirichlet distribution is a multivariate distribution with parameters $\alpha=[\alpha_1, \alpha_2, … , \alpha_K]$, with the following probability density function
$$p(x;\alpha) = \frac{\Gamma(\sum_{k=1}^K \alpha_k)}{\prod_{k=1}^K \Gamma(\alpha_k)} \prod_{k=1}^K x_k^{\alpha_k-1}$$
Kullback-Leibler divergence is defined as
$$KL(p||q) = \int p(x) \log \frac{p(x)}{q(x)} dx = \left < \log \frac{p(x)}{q(x)} \right>_{p(x)}$$
Let’s say we have two Dirichlet distributions \(p\) and \(q\), with parameters \(\alpha\) and \(\beta\) respectively. We write the KL divergence as
\begin{align}
KL(p||q) &= \left < \log \frac{p(x)}{q(x)} \right>_{p(x)} \\
&= \left < \log p(x) - \log q(x) \right>_{p(x)} \\
&= \left < \log \Gamma(\alpha_0) - \sum_{k=1}^K \log \Gamma(\alpha_k) + \sum (\alpha_k-1) \log x_k \right . \\
& \quad \left . -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) - \sum (\beta_k-1) \log x_k \right >_{p(x)} \\
& = \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) \\
& \quad + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) \left<\log x_k \right>_{p(x)}
\end{align}
where \(\alpha_0 = \sum_{k=1}^K \alpha_k \) and similarly \(\beta_0 = \sum_{k=1}^K \beta_k \).
Here, the geometric mean \(\left<\log x_k \right>_{p(x)}\) is equal to \(\psi(\alpha_k)-\psi(\alpha_0)\), where \(\psi\) is the digamma function. The details of calculating the geometric mean will be given below. Finally we have
\begin{align*}
KL(p||q) &= \log \Gamma(\alpha_0) – \sum_{k=1}^K \log \Gamma(\alpha_k) -\log \Gamma(\beta_0) + \sum_{k=1}^K \log \Gamma(\beta_k) + \sum_{k=1}^K (\alpha_k – \beta_k) (\psi(\alpha_k)-\psi(\alpha_0))
\end{align*}
Matlab Code
The matlab code calculating the KL divergence is just a single expression. Given that alpha and beta are row vectors representing the two Dirichlet distribution parameters, the KL divergence is
D = gammaln(sum(alpha)) – gammaln(sum(beta)) – sum(gammaln(alpha)) + …
sum(gammaln(beta)) + (alpha – beta) * (psi(alpha) – psi(sum(alpha)))’;
You can also download it as a script here.
Calculating the Geometric Mean
Since \(\sum x = 1\), \(x\) has \(K-1\) degree of freedom. Therefore, we only need to take the integral over the first \(K-1\) components of \(x\):
\begin{align}
\left<\log x_k \right>_{p(x)} &= \int \log x_k \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \log x_j \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \int \frac{\partial}{\partial \alpha_k} \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1} \label{y2}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \int \prod_{j}^K x_j^{\alpha_j-1} dx_{1:K-1}\\
&= \frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \frac{\partial}{\partial \alpha_k} \left ( \frac{\prod_{j}^K \Gamma(\alpha_j)}{\Gamma(\alpha_0)} \right ) \label{y3} \\
&= \frac{\partial}{\partial \alpha_k} \log \left (\frac{\Gamma(\alpha_0)}{\prod_{j}^K \Gamma(\alpha_j)} \right ) \label{y4}\\
&= \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_k) – \frac{\partial}{\partial \alpha_k} \log \Gamma(\alpha_0)\\
&= \psi(\alpha_k)-\psi(\alpha_0)
\end{align}
We used the following properties:
The remaining steps are straightforward.
Thanks
I thank my colleagues Deniz Akyildiz and Hakan Guldas for the “fruitful discussions” 🙂
I also thanks Wikipedia writers of the pages of Dirichlet Distribution, Beta Distribution, Beta, Gamma and Digamma functions 🙂
Thanks for this. It is exactly what I needed.