R - Data frame Object

> Procedural Languages > R

1 - About

A data frame is a logical implementation of a table in a relational database

A data frame inherits all the property and function of an object.

It has a list of variables of the same number of rows with unique row names.

A matrix-like structure whose columns may be of differing types (numeric, logical, factor and character and so on).

A data frame is a matrix-like structure whose columns may be of differing class (data type)

It's used as the fundamental data structure by most of R's modeling software.

A matrix implementation (array of 2 dimension) also exists

The data frame share many of the properties of matrices and lists.

They can be seen as list where every element of the list has the same length.

Advertising

3 - Creation

3.1 - Constructor

data.frame(..., 
     row.names = NULL, 
     check.rows = FALSE,
     check.names = TRUE,
     stringsAsFactors = default.stringsAsFactors()
     )

where:

  • … is a list of object that have the same number of rows.
  • row.names is a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
  • check.rows checks the rows for consistency of length and names if true.

4 - Persistence

4.1 - Import (By reading a file)

read.table()
  • file with a table format and with comma separator (R - Csv)
read.csv()

Or from the R Studio GUI:

Advertising

4.2 - Export

To clipboard with tabulation which can be paste in Excel

writeToClipboard <- function(x,row.names=FALSE,col.names=TRUE,...) {
  write.table(x,"clipboard",sep="\t",row.names=row.names,col.names=col.names,...)
}
writeToClipboard(data_frame)

5 - Construction Example

5.1 - Simple

colA=c(8,3,6,5,5)
colB=c("Nico","Klaas","Santa","Klaus","Piet")
colC=1:5
df = data.frame(colA,colB,colC)
df
  colA  colB colC
1    8  Nico    1
2    3 Klaas    2
3    6 Santa    3
4    5 Klaus    4
5    5  Piet    5

5.2 - row.names

By default, if the arguments are all named and simple objects (not lists, matrices of data frames) then the argument names give the column names.

  • the rows names are defined by the column B.
data.frame(colA,colB,colC,row.names=colB)
      colA  colB colC
Nico     8  Nico    1
Klaas    3 Klaas    2
Santa    6 Santa    3
Klaus    5 Klaus    4
Piet     5  Piet    5
  • the rows names are defined by letters
data.frame(colA,colB,colC,row.names=letters[1:5])
  colA  colB colC
a    8  Nico    1
b    3 Klaas    2
c    6 Santa    3
d    5 Klaus    4
e    5  Piet    5
Advertising

5.3 - check.rows

check.rows will check the names of the rows when two matrix-like structure are given as argument.

df1 = data.frame(A=1:2,B=2:1, row.names=letters[1:2])
> df1
  A B
a 1 2
b 2 1
> df2 = df1[2:1,]
> df2
  A B
b 2 1
a 1 2
data.frame(df1,df2,check.rows=TRUE)
Error in data.row.names(row.names, rowsi, i) : 
  mismatch of row names in arguments of 'data.frame', item 2

because a,b is not b,a

5.4 - check.names

Duplicate column names are allowed, but you need to use check.names = FALSE

6 - Transformation

6.1 - Selection, Modification

R - Subset Operators (Extract or Replace Parts of an Object)

Example:

  • Select all records with a success_flg equal to 3
res[res$SUCCESS_FLG==3,]

6.2 - Adding a column

data_frame$newColName <- a.vector
data_frame[, "newColName"] <- a.vector
data_frame["newColName"] <- a.vector

6.3 - Join

6.4 - Apply a function

  • lapply: Apply a Function over a List or Vector
  • by:Apply a Function to a Data Frame Split by Factors

6.5 - Sort

6.6 - Update

7 - How to

7.1 - Get the number of rows and columns

# Number of rows
> nrow(df)
[1] 5
> 
> # Number of columns
> ncol(df)
[1] 3

7.2 - Check the attributes

> attributes(df)
$names
[1] "colA" "colB" "colC"
 
$row.names
[1] 1 2 3 4 5
 
$class
[1] "data.frame"

7.3 - Get the value of a cell

  • With indexing:
> df[2,1]
[1] 2
> df[1,2]
[1] Nico
Levels: Klaas Klaus Nico Piet Santa
  • With row and column name:
> df2["d","colA"]
[1] 5

7.4 - Convert it to a matrix

data.matrix()

7.5 - Get the number of rows and columns

> df <- data.frame(A=1:2,B=1:2,C=letters[1:2])
> nrow(df)
[1] 2
> ncol(df)
[1] 3

7.6 - See the header and the tail

The first two lines:

head(df,2)

The last two lines:

tail(df,2)

7.7 - Detached the variables name

attach() allows a user to access the variables name (columns) of a data.frame directly.

8 - Documentation / Reference

lang/r/data.frame.txt · Last modified: 2018/04/24 10:42 by gerardnico