R - Cluster Generation

Card Puncher Data Processing

About

How to generate cluster data.

To generate clustered data, the mean of random generated group of data is shifted.

Steps

Create data points

set.seed(101)
x=matrix(rnorm(100*2),100,2)

where:

  • the seed is set
  • rnorm is a random generation function for the normal distribution that will generate 200 (100*2) points.
  • matrix will make a matrix of 100 rows and 2 columns
 x[1:100,]
[,1]        [,2]
  [1,] -0.56843578  0.24912228
  [2,]  0.77859810 -0.16461954
  [3,] -0.15684682  0.37593032
  [4,] -1.81059190 -0.79511759
  [5,] -1.90281490 -0.13780093
  [6,]  2.33700231  1.88560945
  [7,] -0.46189692 -0.93481448
  [8,]  0.54721322  1.26122751
  ....................

plot(x,pch=19)

Random Data R

Assign randomly the points to one of the three clusters

which=sample(1:3,100,replace=TRUE)

where:

  • sample will make a vector of 100 points between 1 and 3
[1] 1 3 3 3 1 3 2 1 2 3 3 2 1 1 2 3 2 3 3 1 2 3 2 2 1 3 2 2 1 1 3 3 3 1 3 1 1 1 1 2 3 3 1 2 1 2 1 2 2 3 2 3 3 1
 [55] 1 2 1 1 2 2 3 2 2 1 1 3 2 3 3 2 1 3 3 1 3 3 3 3 1 2 2 3 1 3 3 3 1 2 3 3 2 1 2 1 1 3 2 1 3 3

plot(x,col=which,pch=19)

Random Data 3 Group R

Create 3 random points

xmean=matrix(rnorm(3*2,sd=4),3,2)

where:

  • rnorm will generate 8 points on a normal distribution where the standard deviation (sd) has the value 4.
  • matrix will make a matrix of 3 rows and 2 columns
[,1]        [,2]
[1,] -4.235016 -1.84473873
[2,]  1.632360 -0.03466352
[3,] -1.100477 -7.02588458

3 Random Points R

Shift the points toward the 3 points

xclusterd=x+xmean[which,]
plot(xclusterd,col=which,pch=19)

where:

  • for the 100 points, the coordinate of the 3 points are added
  • pch defines the solid circle as point
  • col defines the colour of the point (ie the group)

Clustered Data Generated R





Discover More
Card Puncher Data Processing
R - Hierarchical Clustering

in R The functioncutree cut the tree at level 4. It will produce a vector of numbers from 1 to 4, saying which branch each observation is on.
R Orginal Vs Kmeans
R - K-means clustering

K-means in R. K-means works in any dimension, but in two dimension, we can plot data. Kmeans is in the stats package. where: 3 means that we search 3 cluster Plot the data: with the...



Share this page:
Follow us:
Task Runner