Interactive K-means Clustering

Number of Clusters (K): 5

Noise Level: 1.0

Data Clusters: 5

Cluster Visualization

Iteration: 0

Cluster Statistics

Elbow Method Analysis

Understanding the Elbow Method

The elbow method helps determine the optimal number of clusters (K) by analyzing how the Sum of Squared Errors (SSE) changes with different K values.

Key Points:

As K increases, SSE naturally decreases
The "elbow" appears where adding more clusters gives diminishing returns
This point often indicates a good balance between model complexity and fit

Try it: Generate new data to see how the elbow curve changes. Look for where the curve starts to level off - this often suggests a reasonable choice for K.

Algorithm Execution

A. Initialization

Start with randomly chosen centroids $\mu_1$, $\mu_2$, ..., $\mu_k$.

B. Iteration

Assignment Step: For each point $\,x_i\,$, calculate $\,d(x_i, \mu_j)^2\,$ for all $\,j\,$ and assign $\,x_i\,$ to the cluster with the smallest distance.
Update Step: For each cluster $\,j\,$, recalculate $\,\mu_j\,$ as the mean of all points assigned to it.
SSE Calculation: Calculate the new SSE after the update.
Convergence Check: If SSE hasn't significantly decreased or max iterations reached, stop. Otherwise, go back to step 1.

Mathematical Foundations

A. Dataset Representation

Let $\,X = \{x_1, x_2, ..., x_n\}$ be our dataset, where each $\,x_i\,$ is a $\,d$-dimensional vector.

B. Squared Euclidean Distance

The distance between two points $\,x = (x_1, ..., x_m)$ and $\,y = (y_1, ..., y_m)$ is calculated as:

$$d(x,y)^2 = \sum_{j=1}^m (x_j - y_j)^2 = \|x - y\|_2^2$$

This distance is used in the assignment step to determine the nearest centroid for each point.

C. Sum of Squared Errors (SSE)

The objective function to be minimized is:

$$SSE = \sum_{i=1}^n \sum_{j=1}^k w^{(i,j)} \|\mathbf{X}^{(i)} - \boldsymbol{\mu}^{(j)}\|_2^2$$

Where:

$n$ is the number of data points
$k$ is the number of clusters
$w^{(i,j)}$ is a binary weight function (1 if point $i$ is in cluster $j$, 0 otherwise)
$X^{(i)}$ is the $i$-th data point (vector)
$\mu^{(j)}$ is the centroid of the $j$-th cluster (vector)