R - Factor (Category, Enumerated Type)

Card Puncher Data Processing

About

The factor function is used to encode a vector as a factor (ie categorical data).

When used with a numeric or a date, a binning function will return a factor.

From numeric to a category (For instance, for an id)

A factor is also known as:

Factors can be unordered or ordered.

A factor is an integer vector where each integer has a label

use the str function to see it.

Factors are treated specially by modelling functions like lm() and glm()

Syntax

factor(
         v = character(), 
         levels, 
         labels = levels,
         exclude = NA, 
         ordered = is.ordered(x), 
         nmax = NA
      )

where:

  • v is a vector
  • levels is an optional vector containing the data domain where the order of the levels can be set. This is important in linear modelling because the first level is used as the baseline level.
  • labels is an optional vector of labels for the levels
  • exclude is a vector of values to be excluded when forming the set of levels
  • ordered is a logical flag to determine if the levels should be regarded as ordered.
  • nmax is an upper bound on the number of levels

Management

Simple Initialization

A factor of colours with 4 values and 3 levels

> x=factor(c("Green","Blue","Red","Green"))
> x
[1] Green Blue  Red   Green
Levels: Blue Green Red

We can see that a factor is a composition of labels and integer vector (2 1 3 2):

str(x)
 Factor w/ 3 levels "Blue","Green",..: 2 1 3 2
> unclass(x)
[1] 2 1 3 2
attr(,"levels")
[1] "Blue"  "Green" "Red" 

Level

The same factor of colours as above but with only two colours in the level (domain). One value becomes NA. If you want NA as level see the how to section

> x=factor(c("Green","Blue","Red","Green"),levels=c("Green","Blue"))
> x
[1] Green Blue  <NA>  Green
Levels: Green Blue

You can get the levels with the levels function

levels(x)
[1] "Green" "Blue" 

Label

A factor of colours with two colours levels and different level labels

> x=factor(c("Green","Blue","Green"),levels=c("Green","Blue"),labels=c("LabelGreen","LabelBlue"))
> x
[1] LabelGreen LabelBlue  LabelGreen
Levels: LabelGreen LabelBlue

Exclude

A factor of colours with a colour excluded:

> x=factor(c("Green","Blue","Green"),exclude="Green")
> x
[1] <NA> Blue <NA>
Levels: Blue

How to

Count the number of element by level

with the table function:

> x=factor(c("Green","Blue","Red","Green"))
> table(x)
x
 Blue Green   Red 
    1     2     1 

Have NA as level

If you want NA as a level (ie allow missing values)

> x = factor(c("Blue", NA), exclude = NULL)
> x
[1] Blue <NA>
Levels: Blue <NA>

Transform it back as a vector

as.character(x) 
as.numeric(x)

Order

The default order is alphabetical.

  • The function reorder: Reorder Levels of a Factor

Continuous to Factor

Date to weekday

Example creation of a weekday factor

data_frame$CREATED_ON_WEEKDAY <- factor(weekdays(data_frame$CREATED_ON),levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))

Number to bin

df$ageFactor <- cut(df$age, breaks=c(0, 15, 45, 56, Inf))

Since even the integers are converted to character strings, they are sorted in a dictionary order (rather than by magnitude).

Documentation / Reference

?factor





Discover More
Ggplot Scatterplot Factor Value Ordererd
Ggplot - Order

Order in Ggplot A factor is plotted ordered by its level, you can reorder it with the reorder function. By default, the order is alphabetical, The plot is not ordered by x but by the alphabet...
Card Puncher Data Processing
R - (Object|Variable|Symbol)

R provides a number of specialized objects. They are created (instantiated), used and referenced through variable (known as symbol). When you read the term object in the documentation, you can interchange...
Card Puncher Data Processing
R - Class

A class is just an attribute of an object. You can remove the attribute with the function . The class with then become the type A list can have different class. unclass returns (a copy of) its...
Card Puncher Data Processing
R - Colon (:) Operator - Sequence generator

The Colon (:) Operator generate regular sequences. where: from, starting value of sequence and to: (maximal) end value of the sequence. a, b factors of the same length. “”
Card Puncher Data Processing
R - Date Time (POSIXct)

Time representation in R with the POSIXct Date-Time Classes. See also: seq.POSIXt generate Regular Sequences of Times From other classes For conversion to and from character representations....
Card Puncher Data Processing
R - Step Function Model

cut will create an ordered factor factor, that cuts the variable into bins. where: 19 and 90 are the two boundary points 25, 40 and 65 are the interior cut points.



Share this page:
Follow us:
Task Runner