

Bootstrapping
Megan Ayers
Math 141 | Spring 2026
Wednesday, Week 6
Review how sampling distributions can be used to assess sampling variability
Discuss bootstrapping as a way of approximating the sampling distribution
An October 2020 poll by Marist College surveyed by phone, asking
If November’s election were held today, whom would you support?
How confident should we be in the accuracy of our estimate of \(\widehat{p} = 0.46\)?
There are about \(9\) million registered voters in Pennsylvania. Marist College surveyed only \(1020\) of them (\(0.01\%\) of the population)
We should be skeptical that our estimate is exactly equal to true proportion.
But we should feel confident that our estimate is close to the true proportion.
Why?
The sampling distribution tells us how much variability to expect from sample to sample.
Using probability theory, we know standard error for the sampling distribution for the sample proportion with sample size \(n\) is \(SE = \sqrt{\frac{p(1-p)}{n}}\)
With \(n = 1020\) and \(p = 0.4884\), the standard error is \(SE \approx 0.016\).
Suppose the true proportion of support for Trump/Pence was actually \(p = 0.49\)
Let’s draw \(5000\) simulated samples of size 1020 to see how many have \(\widehat{p}\) far from \(p = 0.49\).

In 95% of samples, the sample proportion \(\widehat{p}\) is at most 0.03 away from the true proportion \(p\)!
Implication: If 49% of voters support Trump/Pence, then \(\approx 95 \%\) of samples of size 1020 will show Trump/Pence’s support, \(\widehat{p}\), to be \(46\% \leq \widehat{p} \leq 52\%\)
Q: How does this contextualize conclusions based on the poll’s sample statistic?
For sampling distributions that are \(\approx\) bell-shaped, 95% of sample statistics will be within 2 standard errors of the true parameter.
This helps us assess how close a sample statistic tends to be to the population parameter.
But in practice, we don’t know the sampling distribution!
The fix?
The term bootstrapping refers to the phrase “to pull oneself up by one’s bootstraps”
Originated in the 19th century as reference to a ludicrous or impossible feat
By mid 20th century, meaning had changed to suggest a success by one’s own efforts, without outside help (the “American Dream” myth)
Its use in statistics (dating from 1979) alludes to both interpretations.


The sample “represents” the population. Many copies of the sample still “represents” the population
Idea: Copy the sample many times to create “a bootstrap population”, and then sample from this to get “bootstrap samples”
Same result, save time: Just sample with replacement from the one original sample.
To generate a Bootstrap Distribution given a sample of size \(n\) from the population,
Generate a bootstrap sample of size \(n\) by resampling with replacement from the original sample
Repeat (1) a large number of times (with technology, at least 1000 times)
For each bootstrap sample, calculate the appropriate statistic (called the bootstrap statistic)
The collection of the bootstrap statistics form the Bootstrap distribution.
Q: How does this process of generating a bootstrap distribution differ from the process of generating the sampling distribution?
We sample from the original sample here; for sampling distributions, we sample from the population
We sample with replacement here; for sampling distributions, we sample without replacement
Population: Consider a very large deck of cards (5200 cards) with 400 of each card value.
Question: What’s the mean value from the deck of cards?
How We’ll Answer:
Since we have the deck of cards, we can look at:
The population distribution
The sampling distribution for sample means
The single sample’s distribution
The bootstrap distribution for sample means
















Q: Compare the Sampling Distribution and the Bootstrap Distribution. How are they similar? How do they differ?
We can compute some relevant statistics:
Population:
| mean_value | sd_value |
|---|---|
| 7 | 3.742017 |
Sampling Distribution:
| mean_xbar | sd_xbar |
|---|---|
| 7.01703 | 0.7331422 |
Sample:
| mean_value | sd_value |
|---|---|
| 6.52 | 3.513308 |
Bootstrap Distribution:
| mean_xbar | sd_xbar |
|---|---|
| 6.52426 | 0.6902801 |
Sampling and Bootstrap have slightly different means, but have similar standard deviations!
Mean of Sampling is the true mean
Mean of Bootstrap is the sample mean