Statistics - Bayesian Information Criterion (BIC)

Thomas Bayes

About

BIC is like AIC and Mallow's Cp, but it comes from a Bayesian argument. The formulas are very similar.

Formula

<MATH> BIC = \frac{1}{n}(RSS + log(n)d \hat{\sigma}^2) </MATH>

The formula calculate the residual sum of squares and then add an adjustment term which is the log of the number of observations times d, which is the number of parameters in the model (intercept and regression coefficient)

As in AIC and Cp, sigma-hat squared is an estimate of the error variance which may or may not be available depending on whether n is greater than p or less than p.

With BIC, we're estimating the average test set RSS across the observations. We want it to be as small as possible. In feature selection, we're going to choose the model with the smallest BIC.

AIC and BIC

The only difference between AIC and BIC is the choice of log n versus 2. In general, if n is greater than 7, then log n is greater than 2. Then if you have more than seven observations in your data, BIC is going to put more of a penalty on a large model. In other words, BIC is going to tend to choose smaller models than AIC is.

BIC is going to select models that have fewer variables than either Cp or AIC.





Discover More
Thomas Bayes
Data Mining - (Test|Expected|Generalization) Error

Test error is the prediction error that we incur on new data. The test error is actually how well we'll do on future data the model hasn't seen. The test error is the average error that results from using...
Bed Overfitting
Machine Learning - (Overfitting|Overtraining|Robust|Generalization) (Underfitting)

A learning algorithm is said to overfit if it is: more accurate in fitting known data (ie training data) (hindsight) but less accurate in predicting new data (ie test data) (foresight) Ie the model...
Plot Best Subset Selection
R - Feature Selection - Indirect Model Selection

In a feature selection process, once you have generated all possible models, you have to select the best one. This article talks the indirect methods. We will select the models using CP but as...
Card Puncher Data Processing
R - Feature selection - Model Generation (Best Subset and Stepwise)

This article talks the first step of feature selection in R that is the models generation. Once the models are generated, you can select the best model with one of this approach: Best...
Thomas Bayes
Statistics - Adjusted R^2

A big R squared indicates a model that really fits the data well. But unfortunately, you can't compare models of different sizes by just taking the one with the biggest R squared because you can't compare...
Subset Selection Model Path
Statistics - Model Selection

Model selection is the task of selecting a statistical model from a set of candidate models through the use of criteria's Dimension reduction procedures generates and returns a sequence of possible...



Share this page:
Follow us:
Task Runner