Parameters and Estimates
The task of statistical inference is to estimate an unknown population parameter using observed data from a sample. In a sampling model, the collection of elements in the urn is called the population. A parameter is a number that summarizes data for an entire population.
We want to predict the proportion of the blue beads in the urn, the parameter p . The proportion of red beads in the urn is \(1-p\) and the spread is \(2p-1\) .
The Central Limit Theorem in Practice
Because \(\bar{X}\) is the sum of random draws divided by a constant, the distribution of \(\bar{X}\) is approximately normal.
We can convert \(\bar{X}\) to a standard normal random variable Z : \[Z=\frac{\bar{X}-E(\bar{X})}{SE(\bar{X})}\]
The probability that \(\bar{X}\) is within .01 of the actual value of p is: \[Pr(Z\leq0.01/\sqrt{p(1-p)/N)}-Pr(Z\leq-0.01/\sqrt{p(1-p)/N)}\]
The Central Limit Theorem (CLT) still works if \(\bar{X}\) is used in place of p . This is called a plug-in estimate. Hats over values denote estimates. Therefore:
\[\hat{SE}(\bar{X})=\sqrt{\bar{X}(1-\bar{X})/N}\]
Using the CLT, the probability that \(\bar{X}\) is within .01 of the actual value of p is:
\[Pr(Z\leq0.01/\sqrt{\bar{X}(1-\bar{X})/N)}-Pr(Z\leq-0.01/\sqrt{\bar{X}(1-\bar{X})/N)}\]
The Central Limit Theorem in Practice
Confidence Intervals and p-Values
We can use statistical theory to compute the probability that a given interval contains the true parameter p.
95% confidence intervals are intervals constructed to have a 95% chance of including p. The margin of error is approximately a 95% confidence interval.
The start and end of these confidence intervals are random variables. To calculate any size confidence interval, we need to calculate the value z for which \(Pr(−z≤Z≤z)\) equals the desired confidence. For example, a 99% confidence interval requires calculating z for \(Pr(−z≤Z≤z)=0.99\).
For a confidence interval of size q , we solve for \(z=1−\frac{1−q}{2}\) .
To determine a 95% confidence interval, use z <- qnorm(0.975). This value is slightly smaller than 2 times the standard error.
Statistical Models
Poll aggregators combine the results of many polls to simulate polls with a large sample size and therefore generate more precise estimates than individual polls. Polls can be simulated with a Monte Carlo simulation and used to construct an estimate of the spread and confidence intervals.
The actual data science exercise of forecasting elections involves more complex statistical modeling, but these underlying ideas still apply.
Bayesian Statistics
In the urn model, it does not make sense to talk about the probability of p being greater than a certain value because p is a fixed value. With Bayesian statistics, we assume that p is in fact random, which allows us to calculate probabilities related to p . Hierarchical models describe variability at different levels and incorporate all these levels into a model for estimating p .
Election Forecasting
Pollsters tend to make probabilistic statements about the results of the election. For example, “The chance that Obama wins the electoral college is 91%” is a probabilistic statement about a parameter which in previous sections we have denoted with d . We showed that for the 2016 election, FiveThirtyEight gave Clinton an 81.4% chance of winning the popular vote. To do this, they used the Bayesian approach we described.
We assume a hierarchical model similar to what we did to predict the performance of a baseball player.
Association Tests
Fisher's exact test determines the p-value as the probability of observing an outcome as extreme or more extreme than the observed outcome given the null distribution.
Data from a binary experiment are often summarized in two-by-two tables.
The p-value can be calculated from a two-by-two table using Fisher's exact test with the function fisher.test().