Data Mining - (Missing Value|Not Available) NA

Thomas Bayes

About

Is there significance in the fact that a value is missing?

“Missing” means what …

  • Unknown
  • Unrecorded
  • Irrelevant

Most learning algorithms deal with missing values but they may make different assumptions about them.

What to do with Missing values

  • Omit instances where the attribute value is missing?
  • Treat “missing” as a separate possible value?

Viz

Other

Remove all attributes with 33% or more missing values if this missing value is not significant.

In general, it's better to replace missing values rather than delete them entirely, since in many cases these attributes will contribute some useful information.

In Weka, the ReplaceMissingValues filter replaces missing values in numerical attributes by the average value, and replaces missing values in nominal attributes by the mode, i.e., the most popular value. With this method, the means and modes are calculated over the whole dataset. Thus for each fold of the cross-validation, some of the attribute values in the training set have been contaminated with information from the test set (although the effect is probably very small). This could produce results that are slightly different from those obtained from a completely independent test set in which missing values are replaced by means/modes from that test set.

Documentation / Reference





Discover More
Data System Architecture
Data Type - Null Value

Every data type includes a special value, called the null value, sometimes denoted by the keyword NULL that reflects the optionality character of the value. The data type of the null value implied by...
Card Puncher Data Processing
R - NA (Not Available)

in R. NA (Not Available|Missing Values) is a logical constant. See NA values have a class. There are integer NA, character NA, etc. NA means “Not Available”. NA is a logical constant of length...
Card Puncher Data Processing
R - Summary

summary statistics function summary is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend...
Data System Architecture
SQL - Null

NULL value in SQL represents missing data in SQL. NULL has its own data domain and implements therefore its own behavior. The most important one being that: An equality comparison of two NULL values...
Card Puncher Data Processing
Statistics - Sample (Variable | Attribute | Feature)

A (statistician|data miner) studying a population would be interested in collecting information different characteristics of the subject (like their length, or weight, or age) in a sample. Those characteristics...
Data System Architecture
Three-valued logic (3VL)

A three-valued logic implements a boolean as having three possible values: true, false and some indeterminate third value. This third value comes in play when you want to retrieve a Boolean...
Data System Architecture
What is a Surrogate Primary key? known also as Substitute, Synthetic or Generated Key - Logical Data Modeling -

A surrogate key is a substitute primary key for when: the data entity are created in distributed way you don't have access to a central entity such as database to create a simple sequence you don't...



Share this page:
Follow us:
Task Runner