A/B (Test|Testing)(Parameters|Model) (Accuracy|Precision|Fit|Performance) MetricsAdjusted $R^2$Akaike information criterion (AIC)Algorithms(Anomaly|outlier) DetectionAnalysis of variance (Anova)Apriori algorithmAssociation (Rules Function|Model) - Market Basket AnalysisSample (Variable|Attribute) Attribute (Importance|Selection) - Affinity AnalysisArea under the curve (AUC)Automatic DiscoveryText Mining - Bag of (words|tokens)Bootstrap aggregating (bagging)(Base rate fallacy|Bonferroni's principle)(Baseline|Naive) classification (Zero R)Bayes’ Theorem (Probability)Benford's law (frequency distribution of digits)Best Subset Selection RegressionBias (Sampling error)Bias-variance trade-off (between overfitting and underfitting)Bayesian Information Criterion (BIC)Data Science - Big DataR (Big R)Bimodal DistributionBinary logistic regressionMathematics - (Combination|Binomial coefficient|n choose k)(Probability|Statistics) - Binomial DistributionData Mining, Book(Boosting|Gradient Boosting|Boosting trees)Bootstrap ResamplingDecision boundary VisualizationTime Series - Breakout detection(C4.5|J48) algorithm(Statistics|Machine Learning|Data Mining) - (Unit|Individual|Case|Subject|Observation|Instance|Input)(Case-control|retrospective) samplingCausation - Causality (Cause and Effect) RelationshipCumulative Distribution Function (CDF)Centering Continous PredictorsCentral limit theorem (CLT)Centroid (center of gravity)ChanceCharacteristic, Property, Nature(Class|Category|Label) Target(Classifier|Classification Function)Clustering (Function|Model)(Prediction|Recommender System) - Collaborative filteringCompetitions (Kaggle and others)Pattern Recognition - Computer VisionText Mining - ConceptStatistics - (Confidence|likelihood) (Prediction probabilities|Probability classification)Confidence IntervalConfounding (factor|variable) - (Confound|Confounder)Confusion MatrixContinuous Variable(Scientific) Control (Group)ConvexText Mining - (Corpus|Document Collection)Correlation (Coefficient analysis)Correlation does not imply causationCosine Similarity (Measure of Angle)CovarianceMallow's CpCross Product (of X and Y) (CP|SP)(Statistics|Data Mining) - (K-Fold) Cross-validation (rotation estimation)Cubic spline(Periodicity|Periodic phenomena|Cycle)(Data Mining|Machine Learning) - Data (Analysis|Analyse)(Data|Knowledge) Discovery - Statistical LearningData PointData (Preparation | Wrangling | Munging)Data - Processing (Functions, Model) Data ProductData - ScienceData ScientistDecision Tree (DT) AlgorithmDecision StumpDeep Learning (Network)Degree of freedom (df)(dependent|paired sample) t-testMath - Derivative(Descriptive|Discovery) AnalysisDevianceDeviation Score (for one observation)Dimensionality (number of variable, parameter) (P)(Dimension|Feature) (Reduction)(Data|Text) Mining - Word-sense disambiguation (WSD)Discrete Variable(Discretizing|binning) (bin)Discriminant analysisQuadratic discriminant analysis (QDA)(Discriminative|conditional) modelsDistance(Probability|Sampling) DistributionData Mining (Text|Document) ClassificationDummy (Coding|Variable) - One-hot-encoding (OHE)Effects (between predictor variable)Effect SizeElastic Net ModelEnsemble Learning (meta set)Entity (Extraction, Lookup)Entropy (Information Gain)Prediction Error (Training versus Test)(Error|misclassification) Rate - false (positives|negatives)(Estimation|Approximation)(Estimator|Point Estimate) - Predicted (Score|Target|Outcome|...)(Evaluation|Estimation|Validation|Testing)(Experimentation|Experimental research)Data Science - (Data exploration|Exploratory Analysis|Discovery ?)Exponential DistributionF-distributions(F-Statistic|F-test|F-ratio)Face Recognition(Factor Variable|Qualitative Predictor)Factor AnalysisFactorial Anova(Feature|Attribute) Extraction FunctionFeature Hashing(Attribute|Feature) (Selection|Importance)Data Science - ForecastingFraud DetectionFrequency(Frequent itemsets|co-occurring items)Fuzzy Logic (Partial Truth)Generalized additive model (GAM)Generalized Boosted Regression ModelsGenerative ModelGetting StartedGeneralized Linear Models (GLM) - Extensions of the Linear Model(Stochastic) Gradient descent (SGD)User GroupGroupingHeadHierarchical ClusteringHierarchyData Science - HistoryHomoscedasticityID3 AlgorithmIndependent t-testStatistical - InferenceInformation GainInformation Retrieval(Interaction|Synergy) effectIntercept - Regression (coefficient|constant) $B\_0$Model Interpretation(Interval|Delta) (Measurement)Java API for data mining (JDM)K-Means Clustering algorithmKernelKeep it simpleK-Nearest Neighbors (KNN) algorithm - Instance based learningKnots (Cut points)Kurtosis (Measure of the "peakedness")Statistical Learning - LassoStandard Least Squares FitLeptokurtic distribution(Level|Label)(Life cycle|Project|Data Pipeline)Lift ChartStatistical Learning - Simple Linear Discriminant Analysis (LDA)Fisher (Multiple Linear Discriminant Analysis|multi-variant Gaussian)Linear (Regression|Model)(Linear spline|Piecewise linear function)Little r - (Pearson product-moment Correlation coefficient)Global vs LocalLOcal (Weighted) regrESSion (LOESS|LOWESS)Log-likelihood function (cross-entropy)Logistic regression (Classification Algorithm)(Logit|Logistic) (Function|Transformation)Loss functions (Incorrect predictions penalty)Data Science - (Kalman Filtering|Linear quadratic estimation (LQE))Machine LearningMain EffectProbability mass function (PMF)MaximumMaximum Entropy AlgorithmMaximum likelihood(Mean|Average) (M|$\backslash mu$|$\backslash bar\{X\}$)Measure(Scales of measurement|Type of variables)(Median|Middle)Minimum(Missing Value|Not Available)Mode (Majority, Peak)(Function|Model)Model SelectionModel Size (d)Moderator Variable (Z) - ModerationMonte Carlo (method|experiment) (stochastic process simulations)Statistic - (moving|rolling|running) average(Average|Mean) Squared (MS) prediction error (MSE)Multi-variant logistic regressionMulti-class (classification|problem)(Multiclass Logistic|multinomial) RegressionMultidimensional scaling ( similarity of individual cases in a dataset)Multiple Linear RegressionNaive Bayes (NB)Natural (Cubic) Spline(Probabilistic?) Neural Network (PNN)Null Hypothesis Significance Testing (NHST)Text Mining - (Text|Natural Language) Processing (NLP)(No Predictor|Mean|Null) ModelNoise (Unwanted variation)(Discrete | Nominal | Category | Reference | Taxonomy | Class | Enumerated | Factor | Qualitative | Constant ) DataNon-linear (effect|function|model)Non-Negative Matrix Factorization (NMF) AlgorithmMulti-response linear regression (Linear Decision trees)(Normal|Gaussian) Distribution - Bell CurveNumeric (or quantitative)Orthogonal Partitioning Clustering (O-Cluster or OC) algorithmOdds (Ratio)(One|Simple) Rule - (One Level Decision Tree)Ordinal (Variable|Number|Measurement) - OrderOutliers Cases(Overfitting|Overtraining|Robust|Generalization) (Underfitting)Data Science - Over-generalizationP-value(Mathematics|Statistics) - Statistical ParameterParetoPatternPrincipal Component (Analysis|Regression) (PCA)(Probability) Density Function (PDF)Percentile (Rank) - QuartileMathematics - Permutation (Ordered Combination)Piecewise polynomialsPartial least squares (PLS)Predictive Model Markup Language (PMML)Poisson (Process|distribution)(Global) Polynomial Regression (Degree)PopulationPopulation ParameterPost-hoc testPower of a test(Prediction|Guess|Forecasting)Predictive Model Markup Language (PMML)(Machine|Statistical) Learning - (Predictor|Feature|Regressor|Characteristic) - (Independent|Explanatory) Variable (X)Privacy (Anonymization)ProbabilityProbit Regression (probability on binary problem)ProblemProcess control (SPC)Pruning (a decision tree, decision rules)R-squared ($R^2$|Coefficient of determination) for Model Accuracy(Stochastic|Random|Independent) (Balanced)Random forestRandom VariableRandom - Weighted (Weighted dice)RangeRank functionRare Event(Fraction|Ratio|Percentage|Share) (Variable|Measurement)Raw scoreRegression(Regression Coefficient|Weight|Slope) (B)Assumptions underlying correlation and regression analysis(Machine learning|Inverse problems) - RegularizationSampling - Sampling (With|without) replacement (WR|WOR)ReSampling ValidationResearch(Residual|Error Term|Prediction error) (e|$\backslash epsilon$)ResistantResult ConsiderationsRidge regressionRoot mean squared (Error|Deviation) (RMSE|RMSD)ROC Plot and Area under the curve (AUC)Rote ClassifierResidual sum of Squares (RSS) = Squared loss ?(Decision) Rule(Data Set|Sample)Sample size (N)SamplingSampling DistributionSampling ErrorScalePython scikit-learnScoring (Applying)Time Series - Seasonality(Random) Seed(Opinion mining | Sentiment (Analyze|classification))Sequence Labeling (Part of speech tagging)(Shrinkage|Regularization) of Regression CoefficientsMathematics - Sigmoid Function (S shape)Signal (Wanted Variation)Significance level(Significance | Significant) EffectSimilaritySimple Effect(Univariate|Simple) Logistic regression(Univariate|Simple) Linear RegressionSkew (-ed Distribution|Variable)Statistic - Smooth (Function Continuity)Smoothing (cubic) splineData Mining / (Software|Tool|Programming Language)Splines( Spread | Variability ) of a sampleStackingStandard Deviation (SD)Standard Error (SE)(Normalize|Standardize)StatisticStatisticsStep Function (piecewise constants)Forward and Backward Stepwise (Selection|Regression)(Stochastic|random) processNLP - Stop Words(Sum|Numeric Addition)(Data|Data Set) (Summary|Description)(Supervised|Directed) Learning ("Training") (Problem)Support Vector Machines (SVM) algorithmSingular Value Decomposition (SVD)(Student's) t-test (Mean Comparison)(t-value|t-statistic)T-distributionsTail(Machine|Statistical) Learning - (Target|Learned|Outcome|Dependent|Response) (Attribute|Variable) (Y|DV)Text Mining - TermText Analytics - Term-document MatrixHypothesis (Tests|Testing) (Test|Expected|Generalization) ErrorTest SetText (Mining|Analytics)Text Mining - term frequency – inverse document frequency (tf-idf)(Threshold|Cut-off) of binary classificationTime SeriesTitanic Data Set(Training|Building|Learning)Training ErrorTraining (Data|Set)Nested (Transactional|Historical) DataTransformTreatments (Combination of factor level)(Trend|Sloping line)True score (Classical test theory)(True Function|Truth)(Total) Sum of the square (TSS|SS)Tuning Parameter(two class|binary) classification problemStatistical Learning - Two-fold validationUniform Distribution (platykurtic)Unsupervised Learning ("Mining")Resampling through Random Percentage SplitValidity (Valid Measures)(Variance|Dispersion|Mean Square) (MS|$\backslash sigma$)Statistics vs (Machine Learning|Data Mining)Random Walk(Golf|Weather) Data SetWekaWord StemZ ScaleZ Score (Zero Mean)

data_mining:linear_regression

“Linear regression” is a standard mathematical technique for predicting numeric outcome.

This is a classical statistical method dating back more than 2 centuries (from 1805).

The linear model is an important example of a parametric model.

Linear regression is very extensible and can be used to capture non-linear effects.

This is very simple model which means it can be interpreted.

There's, typically, a small number of coefficients. If we have a small number of features that are important,, it predicts future data quite well in a lot of cases, despite it's simplicity.

You have a cloud of data points in (2|n) dimensions and are looking for the best straight (line|hyperline) fit.

You might have more than 2 dimensions. It's a standard matrix problem.

- Linear Regression assumes that the dependence of the target <math>Y</math> on the predictors <math>X_1, …, X_p</math> is linear. Even if true regression functions are never linear.

It produces a model that is a linear function (i.e., a weighted sum) of the input attributes. There are non-linear methods that build trees of linear models.

In two dimensions, it's a line, in three a plane, in N, a hyperplane.

Formula:

<math>Y = B_0 + B_1.X_1 + B_2.X_2+ \dots +B_p.X_p</math>

where:

- the B values are weight (known also as regression coefficient or
**parameters**). - the X values are the value of the predictors variable.

Linear Regression works naturally with numeric classes (not with nominal ones) because the predictors are multiplied by weights but can be used for classification as well.

- Calculate weights (B) from training data. Large correlation is good, and the value cannot be greater than 1.

Linear regression can be used for binary classification as well:

- Calculate a linear function using regression
- and then apply a threshold to decide whether it's 0 or 1 (two-valued nominal classes).

Steps:

- On the training dataset: convert the class to binary attributes (0 and 1)
- Use the regression output and the nominal class as an input for One_Rule in order to define a threshold
- Use this threshold for predicting class 0 or 1

For more class labels than 2, the following methods can be used:

- multi-response linear regression
- pairwise linear regression

Steps:

- Training: perform a regression for each class. N regression for a problem where there are n different classes. Set output to 1 for training instances that belong to the class, 0 for instances that don’t
- Prediction:
- choose the class with the largest output
- or use “pairwise linear regression”, which performs a regression for every pair of classes

**Example for multi-response linear regression:**

For a three class problem, we create three prediction model where the target is one class and zero for the others.
If the actual and predicted outputs for the third instance are:

Instance Id | Model | Numeric Class | Prediction |
---|---|---|---|

3 | Blue | 0 | 0.359 |

3 | Green | 1 | 0.322 |

3 | Red | 0 | 0.32 |

the predicted class is Blue because the first model predicts the largest output. The actual class of the instance 3 is Green because the numeric class is a 1 in the second model

By replacing ordinary least squares fitting with some alternative fitting procedures, simple linear model can be improved in terms of:

- Prediction Accuracy: especially when p > n, to control the variance.

M5P performs quite a lot better than Linear Regression.

Weka has a **supervised** attribute filter (not the “unsupervised” one) called NominalToBinary that converts a nominal attribute into the same set of binary attributes used by LinearRegression and M5P.

To show the original instance numbers alongside the predictions, use the AddID unsupervised attribute filter, and the “Output additional attributes” option from the Classifier panel “More options …” menu. Be sure to use the attribute *index* (e.g., 1) rather than the attribute *name* (e.g., ID).

data_mining/linear_regression.txt · Last modified: 2015/07/13 22:14 by gerardnico

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Noncommercial-Share Alike 3.0 Unported