Data Mining - Outliers Cases
1 - About
Outliers are cases that are unusual because they fall outside the distribution that is considered normal for the data.
The distance from the centre of a normal distribution indicates how typical a given point is with respect to the distribution of the data. Each case can be ranked according to the probability that it is either typical or atypical.
The presence of outliers can have a deleterious effect on many forms of data mining. Anomaly detection can be used to identify outliers before mining the data.
In a multidimensional dataset, outliers may only appear when looking at multiple dimensions whereas one one dimension they will be not far away from the mean / median.
2 - Articles Related
3 - Example
For example, census data might show:
- a median household income of $70,000
- and a mean household income of $80,000,
but one or two households might have an income of $200,000. These cases would probably be identified as outliers.