About
The Read.Table function reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.
Articles Related
Syntax
read.table(
file,
header = FALSE,
sep = "",
quote = "\"'",
dec = ".",
row.names,
col.names,
as.is = !stringsAsFactors,
na.strings = "NA",
colClasses = NA,
nrows = -1,
skip = 0,
check.names = TRUE,
fill = !blank.lines.skip,
strip.white = FALSE,
blank.lines.skip = TRUE,
comment.char = "#",
allowEscapes = FALSE,
flush = FALSE,
stringsAsFactors = default.stringsAsFactors(),
fileEncoding = "",
encoding = "unknown",
text
)
where:
- file can be a file, an Url or a connection.
- header indicate if the file has a header line
- sep is a string indicating how the columns are separated
- colClasses, a character vector indicating the class of each column in the dataset
- nrows, the number of rows in the dataset
- comment.char, a character string indicating the comment character
- skip, the number of lines to skip from the beginning
- stringsAsFactors, should character variables be coded as factors?
Performance
By default, Read.table will:
- figure out: colclasses (what type of variable is in each column of the table)
- check if each line is a comment: comment.char (comment.char = “” disable it)
By giving R all these parameters will make R run faster as it don't need to perform them.
Memory
The dataset must no be larger than the amount of your RAM.
1,000,000 rows, 10 columns with numeric data = 1,000,000 * 10 * 8 bytes = 76 Mb
Options
colClasses
colClasses = "numeric"
To figure out the classes of each column, you can use this snippets:
mySubsetDataTable = read.table("myFile.txt", nrows = 100)
classes = sapply(mySubsetDataTable, class)
myDataTable = read.table("myFile.txt", colClasses = classes)
nrows
See the Linux tool wc on how to calculate the number of lines in a file.
Setting nrows will help with memory usage.