Data Mining - (Function|Model)
Table of Contents
1 - About
The notion of automatic discovery refers to the execution of data mining models.
The process of applying a model to new data is known as scoring (or predicting).
A Model object is the result of applying an algorithm to data.
Models can be used in several operations. They can be:
- Inspected, for example to examine the rules produced from a decision tree or association
- Tested for accuracy
- Applied to data for scoring
Essentially, all models are wrong, but some are useful George Box
As we can't model everything, we have to think about what's the best trade-off between accuracy and simplicity.
2 - Articles Related
3 - Complexity
The model (complexity|flexibility) of a model is given by:
- in a polynomial model: The model complexity increases with degree. An higher complexity is a higher order of polynomial.
4 - Function
Each data mining function specifies a class of problems that can be modelled and solved. It's not a mathematical function but a categorical function.
Data mining problem can be divided into two types of “Learning”:
- and unsupervised.
- Supervised Learning (“Training”)
- Unsupervised Learning (sometimes: “Mining”)
Notions of supervised and unsupervised learning are derived from the science of machine learning, which has been called a sub-area of artificial intelligence.
Artificial intelligence refers to the implementation and study of systems that exhibit autonomous intelligence or behavior of their own. Machine learning deals with techniques that enable devices to learn from their own performance and modify their own functioning. Data mining applies machine learning concepts to data. Oracle Documentation
|Attribute Importance||Supervised||Predictive||Identifies the attributes that are most important in predicting a target attribute|
|Classification||Supervised||Predictive||Assigns items to discrete classes and predicts the class to which an item belongs|
|Regression||Supervised||Predictive||Approximates and forecasts continuous values|
|Anomaly Detection||Unsupervised||Descriptive||Identifies items (outliers) that do not satisfy the characteristics of “normal” data|
|Association Rules||Unsupervised||Descriptive||Finds items that tend to co-occur in the data and specifies the rules that govern their co-occurrence|
|Clustering||Unsupervised||Descriptive||Finds natural groupings in the data|
|Feature Extraction||Unsupervised||Descriptive||Creates new attributes (features) using linear combinations of the original attribute|
5 - Trade-off
- Prediction accuracy versus interpretability. Easy interpretation
- Good fit versus overfit or under-fitting.
- Parsimony versus black-box. A simpler model involving fewer variables is preferable over a black-box predictor involving them all.
6 - Property
6.1 - Sparse
6.2 - Dense
Models which involves all variables.
6.3 - True
The True model is the model that represents perfectly the response without noise.
7 - Type
8 - Supermodels
Mining models are known as supermodels, because they contain the instructions for their own Data Preparation.
Data transformation are automatic and embedded in the data mining model.
In Automatic Data Preparation (ADP) mode, the model itself transforms the build data according to the requirements of the algorithm. The transformation instructions are embedded in the model and reused whenever the model is applied.
You can choose to add your own transformations to those performed automatically by Oracle Data Mining. These are embedded along with the automatic transformation instructions and reused with them whenever the model is applied. In this case, you only have to specify your transformations once — for the build data. The model itself will transform the data appropriately when it is applied.
9 - Documentation / Reference
- See : Oracle Data Mining Application Developer's Guide for a discussion of scoring and deployment in Oracle Data Mining.