R - Subset Operators (Extract or Replace Parts of an Object)

> Procedural Languages > R

1 - About

In the Data Manipulation category, you have the subset operators:

  • [
  • [[
  • $

This operators acts on:

to extract or replace parts.

See also: the subset function

Advertising

3 - Usage

# Extraction
x[columnsSelector]
x[rowsSelector, columnsSelector, drop = ]
x[[rowsSelector, columnsSelector]] # shortcut can be used only to select one element (column, row) with drop = true 
x$name

# Replace
x[rowsSelector, columnsSelector] <- value

where:

  • x is an object (data frame, …)
  • rowsSelector: (optional) rows selector by index or name to extract or replace. Default all rows.
  • columnsSelector: columns selector by index or name to extract or replace.
2:3 # column 2 to 3
c("colA", "colC")
  • drop: (optional) logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left (the column becomes a list), but not to drop if only one row is left (the row stays in a data frame).
  • $ is used to extract elements of a list or data frame by name.
  • [ always returns an object of the same class as the original. It can be used to select more than one element (there is one exception)
  • [[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame

4 - Demo Data

We will subset the following data frame

> df = data.frame(colA=1:5,colB=2:6,colC=3:7,row.names=letters[1:5])
> df
  colA colB colC
a    1    2    3
b    2    3    4
c    3    4    5
d    4    5    6
e    5    6    7

Example for a list are available here

Advertising

5 - Subset Type

5.1 - Column

5.1.1 - One Column

  • Extract the column by name:
df$colA
[1] 1 2 3 4 5
  • Retrieve the column by index number and return a vector. The second:
df[,2]  # return an vector integer because the default is to have drop = true for a column (not for a row)
df[,2, drop = TRUE] # same 
[1] 2 3 4 5 6
  • Retrieve the column by index number and return a data frame. The second:
df[2]
df[,2,drop = FALSE] # same
  colB
a    2
b    3
c    4
d    5
e    6
  • Retrieve the column by naming index.
df[,"colB"]
[1] 2 3 4 5 6
  • Remove the second column
df[-2]
  colA colC
a    1    3
b    2    4
c    3    5
d    4    6
e    5    7
Advertising

5.1.2 - Multiple Columns

  • Retrieving columns by indexing
# By naming
df[,c("colA","colB")]
# or
df[c("colA","colB")]
# By Indexing
df[,c(1,2)]
# or
df[c(1,2)]
  colA colB
a    1    2
b    2    3
c    3    4
d    4    5
e    5    6
  • Retrieve columns by range
df[2:3]
# or
df[,2:3]
colB colC
a    2    3
b    3    4
c    4    5
d    5    6
e    6    7
  • Removing Multiple Columns
df[-c(1,2)]
  colC
a    3
b    4
c    5
d    6
e    7

5.2 - Row

5.2.1 - One Row

  • Retrieve the row by index and dropping the dimension. The fourth
df[4,]
  colA colB colC
d    4    5    6
  • Retrieve the row by index without dropping the dimension. The fourth
df[4,,drop=FALSE]
  colA colB colC
d    4    5    6
  • Retrieve the row by index naming
df["d",]
  colA colB colC
d    4    5    6

5.2.2 - Multiple Rows Indexing

  • Retrieve two rows by naming
df[c("b","e"),]
  colA colB colC
b    2    3    4
e    5    6    7
  • Retrieve the first and third rows by logical vector
df[c(TRUE,FALSE,TRUE,FALSE,FALSE),]
  colA colB colC
a    1    2    3
c    3    4    5

5.2.3 - Multiple Rows Filtering

  • Retrieve the rows by logical vector where colB > 3
df[df$colB>3,]
# of
df[df[,"colB"]>3,]
# of
df[df[,2]>3,]
  colA colB colC
c    3    4    5
d    4    5    6
e    5    6    7
  • Retrieve the rows by logical vector where colB > 3 and colC⇐6
df[df$colB>3&df$colC<=6,]
  colA colB colC
c    3    4    5
d    4    5    6

5.3 - Vertical and Horizontal

  • Extracting the intersection of rows and columns
df[2:4,2:3]
  colB colC
b    3    4
c    4    5
d    5    6
  • Update the total time column where the succes flag is 3 and the error text contains 46006
res[res$SUCCESS_FLG==3 & grepl("46066",res$ERROR_TEXT),c("TOTAL_TIME_SEC")] <-200

6 - Documentation / Reference

?":"
?"["
?"$"
?"[["
?"[.data.frame"