Rylan Schaeffer

Kernel Papers

Lloyd’s Algorithm

Lloyd’s Algorithm is an algorithm to solve the K Means problem that is so ubiquitous it is often called the K Means algorithm, although we should distinguish the K Means problem from any particular algorithm used to solve it. Lloyd’s algorithm repeats the two alternating steps until convergence:

  1. Assign: For each datum \(x_n\), compute the distance between the datum and the \(K\) centroids. Assign the datum to the nearest centroid.
  2. Update centroids: For each cluster, set the centroid \(\mu_k = \frac{1}{|C_k|} \sum_{x_n \in C_k} x_n\) based on the previous assignments.

Theorem 1: Lloyd’s monotonically decreases the K Means objective until local convergence.

Proof: First, note that \(L(\{C_k\}) \geq 0\) because \(\lvert \lvert x_n - \mu_k \lvert \lvert^2 \geq 0\) and the sum of nonnegative terms is itself nonnegative. This means the objective function \(L(\{C_k\})\) cannot be lowered indefinitely.

We show that each step cannot increase the loss.

\[L_{post} = L_{pre} - \lvert \lvert x_n - \mu_{pre} \lvert \lvert^2 + \lvert \lvert x_n - \mu_{post} \lvert \lvert^2\]

Per the rules of the assignment step, \(x_n\) is only reassigned if

\[\lvert \lvert x_n - \mu_{pre} \lvert \lvert^2 > \lvert \lvert x_n - \mu_{post} \lvert \lvert^2\]

If \(\mu_{pre} = \mu_{post}\) i.e. the datum is not assigned to a new cluster, the loss is unchanged. If the datum is assigned to a new cluster, we know that

\[L_{post} - L_{pre} < 0\]


\[\begin{align*} &\sum_n \lvert \lvert z_n - z \lvert \lvert^2\\ &= \sum_n \lvert \lvert z_n - \bar{z} + \bar{z} - z \lvert \lvert^2\\ &= \sum_n \lvert \lvert z_n - \bar{z} \lvert \lvert^2 + \lvert \lvert \bar{z} - z \lvert \lvert^2 + 2 \sum_n (z_n \bar{z} - z_n z - \bar{z}\bar{z} + \bar{z} z)\\ &= \sum_n \lvert \lvert z_n - \bar{z} \lvert \lvert^2 + \lvert \lvert \bar{z} - z \lvert \lvert^2\\ &\geq \sum_n \lvert \lvert z_n - \bar{z} \lvert \lvert^2\\ \end{align*}\]

This lemma tells us that relocated the centroids from the previously assigned points to the new points cannot increase the sum of squared distances. Hence, the loss is nonincreasing.