Processing math: 100%

Rylan Schaeffer

Logo
Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers
Failures


“So we seem to find, in our selves, unity over time, unity of function, unity in space. Yet each of these unities is contingent. In pathology or in thought experiment, the integrity ofour memories can be compromised, the connectivity of our brains can be split or fragmented, and our bodies can be broken apart and rebuilt. What then am I?“

25 February 2021

Revisiting k-means - New Algorithms via Bayesian Nonparametrics

by Kulis and Jordan (ICML 2012)

Motivation

One limitation of k-means clustering is that the number of clusters must be specified in advance. Here, Kulis and Jordan show that a low-variance asymptotic analysis of a Dirichlet process Gaussian Mixture Model (DP-GMM) yields a k-means-like clustering algorithm that adds new clusters when justified by the data.

K-Means as Low-Variance Approximation of Expectation Maximization in Gaussian Mixture Model

We quickly review k-means clustering and how it can be derived from a low variance asymptotic approximation of expectation maximization applied to a Gaussian mixture model. For a finite GMM with N observations {xn}, we want to infer cluster assignments {zn} and each cluster’s Gaussian parameters Gaussian parameters {μc,Σc}. Let πc denote the mixing proportion of the cth class. The marginal distribution of an observation is:

p(xn)=Cc=1p(xn|zn=c)p(zn=c)=Cc=1N(xn;μc,Σc)πc

The cluster assignments and cluster parameters can be learnt using Expectation Maximization. Assuming the clusters each have equal covariance σI, the E step takes the following form:

p(zn=c|xn)=p(xn|zn=c)p(zn=c)p(xn)=exp(12σ||xnμc||22)πccexp(12σ||xnμc||22)πc

As σ0, all mass is placed on the nearest centroid, just as we do in k-means clustering. The M step then updates the centroids (cluster means) to the average of the points assigned to that particular cluster:

μc=1|Xc|xnXcxn

where Xc={xn:p(zn=c|xn)=1} is the set of observations assigned to the cth cluster.

We can phrase this as an optimization problem with the objective function:

min{μc},{zn}Nn=1I(zn=c)||xnμc||22

where the cluster centroids are defined above.

Low-Variance Approximation of Dirichlet Process Gaussian Mixture Model

In a DP-GMM, we now have an infinite number of clusters, enabled through the Dirichlet Process. Specifically, we draw an infinite number of cluster centroids {μc}c=1 from a base distribution G0 and draw mixing proportions {πc}c=1 from DP(α,G0). Then, for each observation, we posit that we draw

μ1,μ2,...G0 tags: dirichlet-process - k-means - clustering - bayesian-nonparametrics - mixture-models