K-Means Clustering Algorithm


Place Starting Positions Manually

Information

Overview of K-Means

How does it work

Elbow Method

It works by creating clusters from a data set. This process involves dividing the entire data into groups based on the patterns in the data set. It is an unsupervised learning algorithm, which means there is no fixed target variable as we don’t have targets to predict. We need to look at the data and make observations and create different clusters.


One way to find the optimal number of clusters include using the elbow method. This is when you plot a line chart including the number of clusters (value of k) and the data. Then you must join the points. When there is a rapid drop in values, the line will create an elbow shape.


A target number k is then formed. This will be the number of centroids you need, and it will act as the imaginary locations representing the centre of cluster. The algorithm will then allocate every data point to the nearest cluster, trying to keep centroids as small as possible.


The algorithm will halt when:

Steps

Steps
  1. Choose number of clusters (k) using elbow method
  2. Select k random points from the data as centroids
  3. Assign all the points to the closest cluster
  4. Recompute the centroids of newly formed clusters
  5. Repeat 3 and 4 until centroids are stable in value and the number of iterations defined are achieved

Why is it used

Assumptions on how K Means Works

Definitions

Advantages Disadvantages
Scales to large data sets. Choosing k manually may take a long time.
Simple to implement. Being dependent on initial values such the k value.
Adapts to new examples. Clustering outliers may lead to them getting their own cluster instead of it being ignored.
Generalizes clusters of different shapes and sizes. Clustering data of varying sizes and density can cause issues.