Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers

# Dirichlet Process

Parent: Stochastic Processes

There are many ways to describe a Dirichlet Process (DP). Some of these include:

• an infinite dimensional generalization of the Dirichlet distribution, akin to how the Gaussian process is an infinite dimensional generalization of the multivariate Gaussian distribution

• A distribution over discrete probability distributions

• A distribution whose marginals follow Dirichlet distributions

## Definition

In all the below definitions, let $$\alpha > 0$$ be the concentration parameter and $$G_0$$ some probability distribution. We want to understand what people mean by

$G \sim DP(\alpha, G_0)$

The punchline is that $$G$$ is a random discrete probability measure. A probability measure is informally probability distribution. Discrete means that $$G$$ is more specifically a discrete probability distribution, and random means that $$G$$ is a random quantity.

### Definition via Stick Breaking Construction

This is a constructive definition. First, sample a sequence of random variables from the base distribution:

$$\phi_1, \phi_2, ... \sim_{iid} G_0$$.

Second, sample a sequence of Beta random variables:

$V_1, V_2, ... \sim_{iid} Beta(1, \alpha)$

Picture taking a stick of length 1, and breaking off $$V_1$$ from the stick, then recursing on the remaining stick. This yields a sequence of monotonically decreasing terms such that the sequence sums to 1 with probability 1. The notion of breaking a stick into smaller and smaller pieces is where the term stick-breaking construction comes from. More formally, define

$\pi_k := V_k \prod_{k\prime < k} (1 - V_{k\prime})$

The random measure $$G := \sum_{k=1}^{\infty} \pi_k \delta_{\phi_k}$$ is distributed $$DP(\alpha, G_0)$$

### Definition via Dirac Measures

When we say that $$G \sim DP(\alpha, G_0)$$, we mean that $$G$$ is a random probability measure that can be expressed in a certain way

is a Dirichlet Process if:

$G = \sum_{k=1}^{\infty} \beta_k \delta_{\phi_k}$

### Definition via Dirichlet Marginals

Briefly, we have to cover measure theory. A measure is a function that assigns a non-negative number to set. One additional requirement is that in order to be a measure, the function must be countably additive, meaning that the measure of a subset is equal to the sum of the measure of smaller, disjoint subsets of that subset. A finite measurable partition of a set is a carving of the set into a finite set of subsets such that the subsets are disjoint and that their union is the original space.

Let $$\Omega$$ be a (discrete or continuous) sample space i.e. set of possible outcomes, and let $$G_0$$ be a distribution over the sample space. Let $$\alpha >0$$ be a real, positive parameter. We say a random distribution $$G$$ is distributed according to a Dirichlet Process i.e. $$G \sim DP(G_0, \alpha)$$ if for all finite measurable partitions of the sample space $$(A_1, A_2, ..., A_k)$$, the $$k$$-dim vector $$(A_1, A_2, ..., A_K)$$ follows a Dirichlet distribution with concentration parameters $$(\alpha G_0, ..., \alpha G_0(A_K))$$:

$(G(A_1), ..., G(A_K)) \sim Dirichlet(\alpha G_0(A_1), ..., \alpha G_0(A_K))$

Explained another way, if we carve up the sample space into a finite number of disjoint subsets, measure each of the $$K$$ finite subsets, and scale the measure by a postive parameter $$\alpha$$, then the scaled measures could be used for a Dirichlet distribution. Now, if a distribution $$G$$ is described by a Dirichlet distribute whose concentration parameters are these scaled measures for all possible finite measurable partitions, then $$G \sim DP(\alpha, G_0)$$.

TODO

### Definition via De Finetti’s Theorem

The DP can also be defined using the Blackwell-MacQueen urn scheme

## Properties

• Each sample path (draw) from a Dirichlet Process is a discrete distribution

• The mean $$\mathbb{E}[G(A_k)] = \mathbb{E}[G_0(A_k)]$$. Proof:

\begin{align*} \mathbb{E}[G(A_k)] &= \frac{\alpha G_0(A_k)}{\sum_i \alpha G_0(A_i)}\\ &= \frac{G_0(A_k)}{\sum_i G_0(A_i)}\\ &= \mathbb{E}[G_0(A_k)] \end{align*}
• The variance $$\mathbb{V}[G(A_k)] = G_0(A_k)(1 - G_0(A_k))/(1 + \alpha)$$. Proof:

• Posterior inference: Let $$G \sim DP(\alpha, G_0)$$. Since $$G$$ is a (random) distribution, we can sample from it. Let $$\theta_1, ..., \theta_N \sim_{i.i.d} G$$, where $$N$$ is the total number of samples. Then the posterior of $$G$$ is given by:

$G | \theta_1, ..., \theta_N \sim DP \Big(\alpha + N, \frac{\alpha}{\alpha + N} G_0 + \frac{N}{\alpha + N} \frac{1}{N}\sum_{n=1}^N \delta_{\theta_n} \Big)$

where $$\delta_{\theta_n}$$ is a Dirac measure located at $$\theta_n$$. Intuitively, the posterior’s concentration parameter is the sum of the pseudo-observations $$\alpha$$ and the real observations $$N$$, and the base distribution becomes a weighted average of the prior distribution $$G_0$$ and the empirical distribution $$\frac{1}{N} \sum_{n=1}^N \delta_{\theta_n}$$.

## Relation to Other Stochastic Processes

### Blackwell-MacQueen Urn Scheme

The Blackwell-MacQueen Urn scheme describes the posterior predictive distribution of samples from a distribution sampled from a DP. Specifically, suppose we have the following generative model:

\begin{align*} G &\sim DP(\alpha, G_0)\\ \theta_1, ..., \theta_N &\sim_{i.i.d.} G \end{align*}

and we want to know the posterior predictive distribution

$P(\theta_{N+1} | \theta_1, ..., \theta_N)$

This posterior predictive distribution is described by the Blackwell-MacQueen urn scheme, which marginalizes out the random distribution $$G$$:

\begin{align*} p(\theta_{N+1} \in A | \theta_1, ..., \theta_N) &= \int p(\theta_{N+1} \in A | G) p(G | \theta_1, ..., \theta_N) dG\\ &= \int G(A) p(G | \theta_1, ..., \theta_N) dG\\ &= \mathbb{E}[G(A) | \theta_1, ..., \theta_N]\\ &= \frac{\alpha}{\alpha + N} G_0(A) + \frac{N}{\alpha + N} \frac{1}{N} \sum_{n=1}^N \delta_{\theta_n}(A)\\ &= \frac{1}{\alpha + n} \Big( \alpha G_0(A) + \sum_{n=1}^N \delta_{\theta_n}(A) \Big) \end{align*}

where the expected value of $$G(A)|\theta_1, ..., \theta_N$$ is given by the expected value of the Dirichlet distribution.

The Blackwell-MacQueen Urn Scheme is an infinite-color generalization of the Polya urn scheme, which in contrast has only two

### GEM Distribution

The GEM distribution describes the ordered weights of the Dirichlet process. More specifically, given $$G \sim DP(\alpha, G_0)$$, we can write

$G := \sum_{k=1}^{\infty} \beta_k \delta_{\phi_k}$

The sequence of decreasing $$(\beta_k)_k$$ obeys a $$GEM(\alpha)$$ distribution.