Rylan Schaeffer

Logo
Resume
Research
Learning
Blog
Teaching
Jokes
Kernel Papers


K Means

K means is a clustering problem of finding \(K\) points called centroids (typically in Euclidean space) that minimize the total squared distance from a set of given data \(\{x_n \}_{n=1}^N\). Each datum is assigned to the nearest centroid, which defines a cluster \(C_k\). The optimization problem is:

\[\min_{\{C_k\}} L(\{C_k\}) := \min_{\{C_k\}} \sum_{k=1}^K \sum_{x_n \in C_k} \lvert \lvert x_n - \mu_k \lvert \lvert^2\]

where the centroids \(\mu_k := \frac{1}{\lvert C_k \lvert} \sum_{x_n \in C_k} x_n\) are the average of the data assigned to each cluster.

Algorithms