The population variance gives an indication of how to spread out a data set is. Unfortunately, it is typically impossible to know exactly what this population parameter is. To compensate for our lack of knowledge, we use a topic from inferential statistics called confidence intervals. We will see an example of how to calculate a confidence interval for a population variance.
Confidence Interval Formula
The formula for the (1 - α) confidence interval about the population variance. Is given by the following string of inequalities:
(n - 1)s2 / B < σ2 < (n - 1)s2 / A.
Here n is the sample size, s2 is the sample variance. The number A is the point of the chi-square distribution with n -1 degrees of freedom at which exactly α/2 of the area under the curve is to the left of A. In a similar way, the number B is the point of the same chi-square distribution with exactly α/2of the area under the curve to the right of B.
We begin with a data set with 10 values. This set of data values was obtained by a simple random sample:
97, 75, 124, 106, 120, 131, 94, 97,96, 102
Some exploratory data analysis would be needed to show that there are no outliers. By constructing a stem and leaf plot we see that this data is likely from a distribution that is approximately normally distributed. This means that we can proceed with finding a 95% confidence interval for the population variance.
We need to estimate the population variance with the sample variance, denoted by s2. So we begin by calculating this statistic. Essentially we are averaging the sum of the squared deviations from the mean. However, rather than dividing this sum by n we divide it by n - 1.
We find that the sample mean is 104.2. Using this, we have the sum of squared deviations from the mean given by:
(97 - 104.2)2 + (75 - 104.3)2 +… + (96 - 104.2)2 + (102 - 104.2)2 = 2495.6
We divide this sum by 10 - 1 = 9 to obtain a sample variance of 277.
We now turn to our chi-square distribution. Since we have 10 data values, we have 9 degrees of freedom. Since we want the middle 95% of our distribution, we need 2.5% in each of the two tails. We consult a chi-square table or software and see that the table values of 2.7004 and 19.023 enclose 95% of the distribution's area. These numbers are A and B, respectively.
We now have everything that we need, and we are ready to assemble our confidence interval. The formula for the left endpoint is (n - 1)s2 / B. This means that our left endpoint is:
(9 x 277)/19.023 = 133
The right endpoint is found by replacing B with A:
(9 x 277)/2.7004 = 923
And so we are 95% confident that the population variance lies between 133 and 923.
Population Standard Deviation
Of course, since the standard deviation is the square root of the variance, this method could be used to construct a confidence interval for the population standard deviation. All that we would need to do is to take square roots of the endpoints. The result would be a 95% confidence interval for the standard deviation.