Processing math: 100%

Rylan Schaeffer

Logo
Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers


“All we ever see of stars are their old photographs.”

Stick Breaking Process

Parent: Stochastic Processes

Definition

The Stick-Breaking Process (SBP) is a stochastic process where each sample path is an infinite sequence of random variables π1,π2... such that each variable pii(0,1) and sum of the sequence k=1π1 equals 1 with probability 1. The way the variables is generated is by sampling an i.i.d. sequence of vkBeta(α,β) and then defining

πK=vKKk=1(1vk)

This is the source of the name “stick breaking”: a stick with mass 1 is sequentially broken into smaller and smaller pieces.

Relation to Other Stochastic Processes

Griffiths-Engen-McCloskey (GEM) Distribution

If the stick-breaking weights vk Beta(α,β) with α=1, then the infinite random vector π=(π1,π2,...) is said to be distributed according to the so_called Griffiths-Engen-McCloskey distribution GEM(β).

Dirichlet Process

The SBP is intimately related to the Dirichlet process. Let GDP(α,G0) be a random distribution. The distribution is defined as:

G=Kk=1πkδθk

where π1,π2,... is an infinite sequence of probability masses summing to 1 and θ1,θ2,... is an infinite sequence of elements sampled from the base distribution G0. From this perspective, one can see that an easy way to sample a random distribution from a DP is through the three-step stick-breaking construction:

  1. Sample π1,π2,...i.i.d.SBP(1,α)
  2. Sample θ1,θ2,... from the base distribution G0
  3. Create an infinite mixture distribution G:=k=1πkδθk

This constructed G is distributed according to DP(α,G0), hence the name stick-breaking construction.

Chinese Restaurant Process

The SBP is closely related to the CRP. The difference is that the CRP integrates out the base measure G, meaning that the CRP cares only for how many elements have the same value, regardless of what that value (i.e. the location) is.