# Statistics - Confidence Interval

> (Statistics|Probability|Machine Learning|Data Mining|Data and Knowledge Discovery|Pattern Recognition|Data Science|Data Analysis)

### Table of Contents

## 1 - About

The definition of a confidence interval says that under repeated experiments 95% of the time this confidence interval will contain the true statistic (mean, …).

- A 95% confidence interval is defined as a range of values such that with 95% probability, the range will contain the true unknown value of the parameter.
- A degree of confidence of 95% means that you have 95% confidence that the true score should be in the confidence interval.

Reporting an interval, acknowledge the fact that we have sampling error.

The logic of the confidence intervals is to report:

- a range of values, rather than just a single value
- in other word to report an
**interval estimate**rather than a point estimate

The phrase confidence interval comes from the fact that (researchers|writers) will be (or should be) more confident in the accuracy of what they're reporting if they report an interval estimate rather than a point estimate.

A confidence interval is an interval estimate of a population parameter based on one random sample.

Confidence intervals is an entirely different approach than NHST which is just to report around sample statistics, rather than engage in inferential statistics per se.

The width of a confidence interval is determined by and calculated from the standard error. It's then influenced by:

- Variance in the population (and sample)
- Degree of confidence desired

Confidence Interval can be applied to any statistic.

Confidence intervals are a frequentist concept: the interval, and not the true parameter, is considered random. Even a Bayesian would not necessarily agree with statement 2 (it would depend on his/her prior distribution).

A Bayesian would not agree with this statement as it would depend on his/her prior distribution “If I perform a linear regression and get confidence interval from 0.4 to 0.5, then there is a 95% probability that the true parameter is between 0.4 and 0.5.”.

## 2 - Articles Related

## 3 - Hypothesis test

Hypothesis testing, is a closely related idea. They're doing equivalent things.

If the hypothesis test:

- fails. We will reject the null hypothesis and conclude that the slope is not 0. Correspondingly the confidence interval constructed for that data for the parameter
**will not**contain 0. - is not rejected. We cannot conclude that the predictor X has an effect. Its slope may be 0. The confidence interval for that parameter will contain 0.

The confidence interval is then also doing hypothesis testing but it's also telling how big the effect is.

It's then always good to compute confidence intervals as well as do hypothesis test.

## 4 - Around

### 4.1 - Sample Mean

Sample means (M)

This is sort of the easiest and most obvious place to start when talking about confidence intervals.

<MATH> \begin{array}{rrl} Upper bound & = & M + t.SE \\ Lower bound & = & M – t.SE \\ \end{array} </MATH>

where:

- M is the sample mean, when we only have one sample
- SE is the standard error
- t is a t value which comes from the t distribution. t depends on level of confidence desired and sample size

As sample size increases, the width of confidence intervals typically decrease. See standard error formula