# R - Cluster Generation

How to generate cluster data.

To generate clustered data, the mean of random generated group of data is shifted.

## 3 - Steps

### 3.1 - Create data points

```set.seed(101)
x=matrix(rnorm(100*2),100,2)```

where:

• the seed is set
• rnorm is a random generation function for the normal distribution that will generate 200 (100*2) points.
• matrix will make a matrix of 100 rows and 2 columns
` x[1:100,]`
```              [,1]        [,2]
[1,] -0.56843578  0.24912228
[2,]  0.77859810 -0.16461954
[3,] -0.15684682  0.37593032
[4,] -1.81059190 -0.79511759
[5,] -1.90281490 -0.13780093
[6,]  2.33700231  1.88560945
[7,] -0.46189692 -0.93481448
[8,]  0.54721322  1.26122751
....................```
`plot(x,pch=19)`

### 3.2 - Assign randomly the points to one of the three clusters

`which=sample(1:3,100,replace=TRUE)`

where:

• sample will make a vector of 100 points between 1 and 3
``` 1 3 3 3 1 3 2 1 2 3 3 2 1 1 2 3 2 3 3 1 2 3 2 2 1 3 2 2 1 1 3 3 3 1 3 1 1 1 1 2 3 3 1 2 1 2 1 2 2 3 2 3 3 1
 1 2 1 1 2 2 3 2 2 1 1 3 2 3 3 2 1 3 3 1 3 3 3 3 1 2 2 3 1 3 3 3 1 2 3 3 2 1 2 1 1 3 2 1 3 3```
`plot(x,col=which,pch=19)`

### 3.3 - Create 3 random points

`xmean=matrix(rnorm(3*2,sd=4),3,2)`

where:

• rnorm will generate 8 points on a normal distribution where the standard deviation (sd) has the value 4.
• matrix will make a matrix of 3 rows and 2 columns
```          [,1]        [,2]
[1,] -4.235016 -1.84473873
[2,]  1.632360 -0.03466352
[3,] -1.100477 -7.02588458```

### 3.4 - Shift the points toward the 3 points

```xclusterd=x+xmean[which,]
plot(xclusterd,col=which,pch=19)```

where:

• for the 100 points, the coordinate of the 3 points are added
• pch defines the solid circle as point
• col defines the colour of the point (ie the group)