# Statistics - Sample (Variable|Attribute)

A (statistician|data miner) studying a population would be interested in collecting information about different characteristics of the units (like their length, or weight, or age) in a sample. Those characteristics are called variables.

In data modeling, they are just columns.

They can take on multiple values. In contrast, a constant has only one value

A variable have:

In mathematics, variables are listed among the arguments that the function takes.

Data contains values grouped into variables and observations

## 3 - Data Type

Variables (of an instance) are of two data types:

• discrete (called nominal, categorical or qualitative)
• or continuous (called numeric, numerical or quantitative).

and have 4 levels

### 3.1 - Categorical

When a characteristic can be neatly placed into well-defined groups, or categories that do not depend on order, it is called a categorical variable (some statisticians use the word qualitative).

### 3.2 - Numerical

When we are interested in the total number of each species of tortoise, or how many individuals there are per square kilometre. This type of variable is called numerical (or quantitative).

## 4 - Usage

Type of attribute Type of model Description
(predictors|feature) supervised Predictors that affect a given outcome
outcome supervised outcome that are affected by predictors
descriptors (unsupervised|descriptive) Items of information being analysed for natural groupings or associations.

## 5 - Name Glossary

### 5.3 - Quasi-independent

A Quasi-independent variable is a variable that can not be random assigned (example: concussions, gender, Sexual orientation)

Since the independent variable does not involve random and representative sampling, arguments about causality are not as strong

## 6 - Others

### 6.2 - Case id

A Case Id identifies uniquely each record in order to help with model repeatability.