R - K-means clustering

Card Puncher Data Processing

About

K-means in R.

Steps

Generate Data

R - Cluster Generation

K-means works in any dimension, but in two dimension, we can plot data.

KMeans

Kmeans is in the stats package.

km.out=kmeans(xclustered,3,nstart=15)
km.out

where:

  • 3 means that we search 3 cluster
K-means clustering with 3 clusters of sizes 33, 28, 39

Cluster means:
       [,1]       [,2]
1 -1.107234 -6.7087012
2  1.803277 -0.0341333
3 -3.900942 -2.1654215

Clustering vector:
  [1] 1 2 3 3 2 1 3 1 2 2 2 3 3 1 3 3 1 1 2 1 1 2 3 3 3 3 3 3 3 3 1 2 3 3 1 2 1 1 3 2 3 1 3 1 3 3 2 2 2 1 1 3 1 1
 [55] 1 1 3 1 1 2 3 1 3 1 2 1 3 3 2 3 1 3 2 2 1 2 3 3 1 3 1 3 2 2 3 2 2 2 2 2 2 3 3 3 1 3 1 2 1 1

Within cluster sum of squares by cluster:
[1]  63.30585  54.25311 114.05169
 (between_SS / total_SS =  84.5 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"        
[8] "iter"         "ifault"  

Plot

Plot the data:

  • with the colours of the kmeans cluster output (km.outcluster)
  • a empty circle (pch=1)
  • growth (magnified) two times (cex=2)

in order to add in it the points of the original generated grouping.

plot(xclustered,col=km.out$cluster,pch=1,cex=2)

Add the points in its originally grouping

points(xclustered,col=which,pch=19)
points(xclustered,col=c(1,3,2)[which],pch=19)

where

col=c(1,3,2)[which]

will map the colour of the groups.

R Orginal Vs Kmeans







Share this page:
Follow us:
Task Runner