Confidence Intervals II

Megan Ayers

Math 141 | Spring 2026
Wednesday, Week 7

Midterm logistics

Please be mindful of students finishing exam from the previous section when arriving for the midterm (section 2)
Lab 5 grades are posted
Final office hours before midterm: Megan’s office today from 3:30-5pm

Goals for today

Review the concept of a 95% confidence interval
Discuss (one way) of creating confidence intervals with different confidence levels.
Interpret confidence intervals and discuss common misconceptions

Review

Setting

There is some population we’re interested in studying.
- e.g., Reed College students
There is a population parameter we want to know
- e.g., average nightly hours of sleep (\(\mu\))
We draw a sample from the population with sample size \(n\)
- e.g., We survey \(n=50\) Reed students
We provide a point estimate of the parameter with a statistic
- e.g., average hours of sleep in our sample (\(\bar{x}=8.02 \ \text{hours}\))
We construct an interval estimate centered at our statistic
- e.g., \(8.02 \pm 0.16 \ \text{hours}\)

Warm-Up

What are the differences between a sampling distribution and a bootstrap distribution?
Suppose we created confidence intervals based on distinct samples of size \(n=10\) and \(n=100\). How might they differ?

Confidence Intervals

A confidence interval gives a range of plausible values for a parameter. It usually takes the form:

\[ \textrm{Statistic }\pm \textrm{ Margin of Error (ME)} \]

The Margin of Error (ME) is partially determined by our sample size:
- Bigger samples yield smaller ME and interval (more data \(\implies\) more certainty)
- Smaller samples yield wider ME and interval (less data \(\implies\) less certainty)

Every confidence interval has a confidence level:
- the percentage of samples that would yield a corresponding confidence interval that contains the true value of the parameter.

For example, were we to:
- Repeatedly draw samples from the population
- Create 95% confidence intervals in each sample
- 95% of those confidence intervals would contain the true parameter

Sampling Distributions can determine the Margin of Error

Reminders:
- Sampling distribution is often bell-shaped, with the mean equal to the parameter
- In bell-shaped (normal) distributions, 95% of observations lie in the range:

\[ \text{Mean} \pm 2*\text{Standard Error} \]

Sampling Distributions can determine the Margin of Error

Implication: in the sampling distribution, 95% of sample statistics lie in the range: \[ \text{Parameter} \pm 2*\text{Standard Error} \] where the Standard Error is the standard deviation of the sampling distribution

So, 95% of sample statistics are within \(2*\mathrm{SE}\) from the parameter!
Thus, 95% confidence intervals usually take the form:

\[ \textrm{Statistic }\pm 2*\textrm{Standard Error (SE)} \]

We typically estimate the SE (i) with the bootstrap, or (ii) with a formula

Bootstrapping Confidence Intervals

Example: Reproduction Rate for Covid-19

Researchers are interested in the COVID-19 reproduction rate (the average number of individuals each infected person further infects)
Sample 50 infected individuals and perform contract tracing.

  infected  n
1        0  5
2        1 13
3        2 14
4        3 12
5        4  5
6        6  1

  mean_infected
1          2.06

Goal: Create an interval of plausible values for the reproduction rate.
Q: What is the population? What is the parameter?
Q: What is the sample? What is the statistic?

Bootstrap Reproduction Rate

We can use our sample to create a 95% confidence interval. What is each step doing, and why?

Step 1:

set.seed(121)
bootstrap_samples <- covid %>% rep_sample_n(size = 50, replace = TRUE, reps = 5000)

Step 2:

bootstrap_stats <- bootstrap_samples %>% group_by(replicate) %>% summarize(x_bar = mean(infected))

Step 3:

Step 4:

bootstrap_stats %>% summarize(SE = sd(x_bar))

# A tibble: 1 × 1
     SE
  <dbl>
1 0.181

Step 5:

\[ \bar x \pm 2 \cdot SE \implies 2.06 \pm 2 \cdot 0.181 \]

Bootstrap Reproduction Rate

We can use our sample to create a 95% confidence interval.

Create the bootstrap samples:

set.seed(121)
bootstrap_samples <- covid %>% rep_sample_n(size = 50, replace = TRUE, reps = 5000)

Compute bootstrap statistics within each bootstrap sample:

bootstrap_stats <- bootstrap_samples %>% group_by(replicate) %>% summarize(x_bar = mean(infected))

Graph the bootstrap distribution to check shape:

Estimate the standard error

bootstrap_stats %>% summarize(SE = sd(x_bar))

# A tibble: 1 × 1
     SE
  <dbl>
1 0.181

Because the bootstrap distribution is bell-shaped, we use \(2 \times\) the estimated SE as our margin of error to create a 95% confidence interval

\[ \bar x \pm 2 \cdot SE \implies 2.06 \pm 2 \cdot 0.181 \]

Generalized Confidence Intervals

In the previous example, we used our knowledge that for approximately bell-shaped sampling distributions, 95% of sample statistics are within 2 SE of the population parameter
- Suppose we instead want a different success rate for our estimation method
- Or suppose we want interval estimates for sampling distributions that are NOT bell-shaped
We can make these modifications again using the bootstrap approximation to the sampling distribution and make:

General Confidence Intervals

The \(C\%\) confidence interval for a parameter is an interval estimate that is computed from sample data by a method that captures the parameter for \(C\%\) of all samples.

Review: Percentiles and Quantiles

For a number \(k\) between \(0\) and \(100\), the \(k\)th percentile of a distribution is the value so that \(k\%\) of the data is less than or equal to that value.
- The median is the 50th percentile of a distribution
- 1st/3rd quartiles (Q1/Q3) are the 25th and 75th percentiles, respectively.

For a number \(p\) between \(0\) and \(1\), the \(p\) quantile of a distribution is the value so that a proportion \(p\) of the data is less than or equal to that value.
- The median is the \(0.5\) quantile of a distribution
- 1st/3rd quartiles (Q1/Q3) are the \(0.25\) and \(0.75\) quantiles, respectively.

Quantiles and Percentiles

By definition, 2.5% of the data is less than the .025 quantile, and 2.5% of the data is greater than the .975 quantile

This means that 95% of the data is between the .025 and the .975 quantiles.

Quantiles and Percentiles

By definition, 2.5% of the data is less than the .025 quantile, and 2.5% of the data is greater than the .975 quantile

This means that 95% of the data is between the .025 and the .975 quantiles

For sampling distributions that are bell-shaped, the .025 quantile is about \(2\cdot SE\) below the mean, and the .975 quantile is about \(2\cdot SE\) above the mean
So using the .025 and .975 quantiles is roughly equivalent to forming a 95% CI as: \(\text{Statistic} \pm 2*\text{SE}\)!

95% Confidence Interval: 2 ways

\(\color{red}{\text{Statistic} \pm 2*\text{SE}}\)
- \(\mathrm{Statistic} \ (\bar{x}) = 2.06\)
- \(\mathrm{SE} = 0.18\)
- \(95\% \ \mathrm{CI} = 1.70 \ \text{to} \ 2.42\)

\(\color{red}{\text{Percentile Method}}\)

quantile(bootstrap_stats$x_bar, c(0.025, 0.975))

 2.5% 97.5% 
 1.72  2.40

Percentile Method for Confidence Intervals

Percentile Method: For a \(C\%\) Confidence Interval, report the quantiles of the bootstrap distribution such that:
- \(C\%\) lies in the middle
- \(\frac{(100-C)}{2}\%\) lies on either end (i.e., “the rest” is evenly distributed on the ends)
Ex: For \(C\% = 95\%\) (i.e., a 95% Confidence Interval), we want
- 95% in the middle
- \(\frac{(100-C)}{2}\% = \frac{(100-95)}{2}\% = {2.5\%}\) on either end
Report the 0.025 quantile (or 2.5th percentile) and the 0.975 quantile (or 97.5th percentile)!

The Percentile Method: Example

Suppose we want to construct a 90% confidence interval for the reproduction rate
- Find the .05 and .95 quantiles in the bootstrap distribution.
- 90% of bootstrap sample statistics will be between these values

We can use the quantile function in R to calculate the .05 and .95 quantiles

quantile(bootstrap_stats$x_bar, c(.05, .95))

  5%  95% 
1.76 2.36

The Percentile Method: Example

Suppose we want to construct a 90% confidence interval for the reproduction rate
- Find the .05 and .95 quantiles in the bootstrap distribution.
- 90% of bootstrap sample statistics will be between these values

Our 90% confidence interval is therefore 1.76 to 2.36

quantile(bootstrap_stats$x_bar, c(.05, .95))

  5%  95% 
1.76 2.36

Percentile Method: Practice

With neighbor(s), name the quantiles of the bootstrap distribution you would need for:

80% Confidence Interval
99% Confidence Interval
2% Confidence Interval (You would never do this! This is just for fun (: )

Answers:

0.10 and 0.90 quantiles
0.005 and 0.995 quantiles
0.49 and 0.51 quantiles

Width of Confidence Intervals

Two factors determine the width of a confidence interval:

Sample Size
- The Standard Error of the sampling distribution decreases as sample size increases.
- Smaller sample size \(\implies\) larger interval
- Larger sample size \(\implies\) smaller interval
Confidence Level
- Decreasing the confidence level brings the relevant quantiles closer to the middle, decreasing the width of the interval.
- Higher confidence level \(\implies\) larger interval
- Lower confidence level \(\implies\) smaller interval

Discuss with Neighbor(s): Confidence Intervals get smaller with:

a larger sample size
a lower confidence level

Intuitively, why does this make sense?

Width of Confidence Intervals

Confidence Intervals get smaller with:

a larger sample size
a lower confidence level

Note: These reasons for getting smaller are competing in terms of certainty!

With a larger sample size, the interval gets smaller because we’re more certain the statistic is close to the parameter.
With a lower confidence level, we become less certain the interval will contain the true parameter.

Reminder: While a lower confidence level gives you a smaller interval, there is a cost! (i.e., lower success rate)

Confidence Interval Misunderstandings

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

A 95% confidence interval does not contain 95% of observations in the population.

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

A 95% confidence interval does not contain 95% of observations in the population.

Saying that 95% of all Reed students sleep between 7.86 and 8.34 hours should just feel wrong. That’s a pretty narrow interval!

Misunderstanding 2

A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Misunderstanding 2

A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Q: Why do the sampling distribution and bootstrap distribution look different?

Misunderstanding 2

A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Q: Why do the sampling distribution and bootstrap distribution look different?

Misunderstanding 3

Given a 95% confidence interval, Do Not Say: “There is a 95% chance that the true parameter falls within my interval.”

Once we take a sample and calculate a confidence interval, there’s no more randomness!
- The interval either does or doesn’t contain the (unknown) parameter.
This is may seem like arguing over semantics – but it’s an important distinction!

Instead, say either:

“If we were to take many samples and calculate a 95% confidence interval for each, then 95% of them would contain the true parameter”
“We are 95% confident that the true parameter is in our confidence interval”

Next time

Hypothesis testing!

With any time left…

Midterm review questions?