Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers

# Complete Random Measures

Disclaimer: Most of this comes from Tamara Broderick’s excellent paper “Posteriors, conjugacy, and exponential families for completely random measures.”

## Review of BNP Models

Bayesian nonparametric (BNP) models revolve around collections of pairs of (traits, frequencies/rates). The principle challenge of Bayesian nonparametrics is how, starting with a countable infinity of traits and frequencies in the prior, to integrate over the infinite possibilities to compute a finite posterior over traits and frequencies based on data. More specifically, we have traits $$\{\psi_k \in \Psi\}$$ and frequencies or rates $$\theta_k$$. A BNP model starts with a discrete measure on $$\Psi$$:

$\Theta := \sum_{k=1}^K \theta_k \delta_{\psi_k}$

where $$K$$ can be finite or countably infinite. The $$n$$th datum $$X_n is another discrete measure on$$\Psi:

$X_n := \sum_{k=1}^{K_n} x_{n,k} \delta_{\psi_{n,k}}$

where $$x_{n,k} \in \mathbb{R}_+$$ is the degree to which the $$n$$th datum possesses the trait $$\psi_{n,k}$$. Each $$\psi_{n,k} \in \{\psi_k \}$$ but different data can possess different traits.

Using a BNP model requires specifying a prior distribution $$p(\Theta)$$ and a likelihood $$p(X_n|\Theta)$$.

## Random Measures

A random measure is a random element whose values are measures. More formally, let $$\Sigma_{\Psi}$$ be the sigma-algebra of some space $$\Psi$$. For a measure $$\Theta$$ over $$\Psi$$ to be random, for any measurable set $$A \in \Sigma_{\Psi}$$, the quantity $$\Theta(A)$$ must be a random variable.

## Completely Random Measures

A completely random measure (CRM) is a random measure that satisfies 1 additional property: for any disjoint, measurable sets $$A_1, ..., A_k \in \Sigma_{\Psi}$$, the random variables $$\Theta(A_1), ..., \Theta(A_k)$$ are independent.

## Properties

Kingman 1967 shows that CRMs can always be split into 3 measures:

$\Theta = \Theta_{det} + \Theta_{fix} + \Theta_{ord}$

Each measure is explained in more detail below:

### Deterministic Component Measure

$$\Theta_{det}$$ is a deterministic measure.

### Fixed Locations Measure

$$\Theta_{fix}$$ is the “fixed locations” measure.

$\Theta_{fix} = \sum_{k=1}^{K_{fix}} \theta_{fix, k} \delta_{\psi_{fix}, k}$

where $$\theta_{fix,k} \in \mathbb{R}_{\geq 0}$$ are random weights and $$\delta_{\psi_{fix}, k}$$ are fixed locations. Note that, by the independence property of CRMs, the $$\theta_{fix,k}$$ must be independent random variables.

### Ordinary Measure

$$\Theta_{ord}$$ is the “ordinary” measure. Explaining this requires some familiarity with Poisson point processes. To generate an ordinary component, start with a Poisson point process on $$\mathbb{R}_{\geq 0} \times \Psi$$ characterized by some rate measure $$\mu(d\Theta \times d\Psi)$$. The ordinary component is

$\Theta_{ord} = \sum_{k=1}^{K_{ord}} \theta_{ord, k} \delta_{\psi_{ord, k}}$