K-Means Visualiser

N (the number of node):

K (the number of cluster):

Draw Centroids:

Place Starting Positions Manually

How does it work

It works by creating clusters from a data set. This process involves dividing the entire data into groups based on the patterns in the data set. It is an unsupervised learning algorithm, which means there is no fixed target variable as we don’t have targets to predict. We need to look at the data and make observations and create different clusters.

One way to find the optimal number of clusters include using the elbow method. This is when you plot a line chart including the number of clusters (value of k) and the data. Then you must join the points. When there is a rapid drop in values, the line will create an elbow shape.

A target number k is then formed. This will be the number of centroids you need, and it will act as the imaginary locations representing the centre of cluster. The algorithm will then allocate every data point to the nearest cluster, trying to keep centroids as small as possible.

The algorithm will halt when:

Centroids stabilise meaning no change in value due to successful clustering.

The number of iterations declared has been achieved.

Steps

Choose number of clusters (k) using elbow method

Select k random points from the data as centroids

Assign all the points to the closest cluster

Recompute the centroids of newly formed clusters

Repeat 3 and 4 until centroids are stable in value and the number of iterations defined are achieved

Why is it used

Customer segmentation – helps divide customers into groups based on common characteristics

Document clustering – will put documents into groups depending on similarities

Image segmentation – will cluster images with similar pixels

Recommendation engines – makes recommendations depending on likes. E.g. songs

Definitions

K Value: the number of centroids you need in a dataset.

Elbow Method: heuristic used to determine the number of clusters in a data set.

Clustering: A collection of data points that are accumulated together as they have similarities.

Centroids: location representing centre of cluster.

Advantages	Disadvantages
Scales to large data sets.	Choosing k manually may take a long time.
Simple to implement.	Being dependent on initial values such the k value.
Adapts to new examples.	Clustering outliers may lead to them getting their own cluster instead of it being ignored.
Generalizes clusters of different shapes and sizes.	Clustering data of varying sizes and density can cause issues.

Advantages

Disadvantages

Scales to large data sets.

Choosing k manually may take a long time.

Simple to implement.

Being dependent on initial values such the k value.

Adapts to new examples.

Clustering outliers may lead to them getting their own cluster instead of it being ignored.

Generalizes clusters of different shapes and sizes.

Clustering data of varying sizes and density can cause issues.

K-Means Clustering Algorithm

Information | Visualisation

Information

Overview of K-Means

How does it work

Steps

Why is it used

Assumptions on how K Means Works

Definitions