k-means

k-means

An unsupervised learning algorithm that partitions $n$ data points into $k$ clusters by iteratively assigning points to the nearest cluster center and updating centers as the mean of assigned points.

Given data ${x_{i}}_{i = 1}^{n}$ and number of clusters $k$ , it minimizes:
$J = i = 1 \sum n j = 1 \sum k r_{ij} ∥ x_{i} - μ_{j} ∥^{2}$
where $r_{ij} = 1$ if point $i$ belongs to cluster $j$ (else 0), and $μ_{j}$ is the centroid of cluster $j$ .

First we initialize $k$ centroids $μ_{1}, ..., μ_{k}$ (often randomly from data points)
The algorithm alternates between two steps until convergence:

Assignment: Assign each point to the nearest centroid: for each point $x_{i}$ : assign to cluster $c_{i} = ar g min_{j} ∥ x_{i} - μ_{j} ∥^{2}$

Update: Recompute centroids as the mean of assigned points: for each cluster $j$ : update $μ_{j} = \frac{1}{∣ S _{j} ∣} \sum_{i \in S_{j}} x_{i}$

where $S_{j} = {i : c_{i} = j}$ is the set of points assigned to cluster $j$ .

This simple procedure guarantees that the objective $J$ decreases monotonically, though it only finds local minima.
The final clustering depends heavily on initialization - poor starting centroids can lead to suboptimal solutions.

Practical considerations

Choosing $k$ : Often determined by domain knowledge or methods like the elbow method (plot reconstruction error vs $k$ ) or silhouette analysis.

Initialization: k-means++ improves on random initialization by choosing centroids that are far apart, leading to better convergence and final clusters.

Assumptions: k-means implicitly assumes spherical clusters of similar size. It struggles with elongated or overlapping clusters, where algorithms like gaussian mixture models or DBSCAN may perform better.

The algorithm’s $O (nk d)$ complexity per iteration (for $d$ -dimensional data) makes it efficient for large datasets. Variants include mini-batch k-means for even larger data and kernel k-means for non-linearly separable clusters.

Max Wolf's Second Brain

Explorer

k-means

Graph View