Confidence Intervals II



Megan Ayers

Math 141 | Spring 2026
Wednesday, Week 7

Midterm logistics

  • Please be mindful of students finishing exam from the previous section when arriving for the midterm (section 2)

  • Lab 5 grades are posted

  • Final office hours before midterm: Megan’s office today from 3:30-5pm

Goals for today

  • Review the concept of a 95% confidence interval

  • Discuss (one way) of creating confidence intervals with different confidence levels.

  • Interpret confidence intervals and discuss common misconceptions

Review

Setting

  • There is some population we’re interested in studying.
    • e.g., Reed College students
  • There is a population parameter we want to know
    • e.g., average nightly hours of sleep (\(\mu\))
  • We draw a sample from the population with sample size \(n\)
    • e.g., We survey \(n=50\) Reed students
  • We provide a point estimate of the parameter with a statistic
    • e.g., average hours of sleep in our sample (\(\bar{x}=8.02 \ \text{hours}\))
  • We construct an interval estimate centered at our statistic
    • e.g., \(8.02 \pm 0.16 \ \text{hours}\)

Warm-Up

  1. What are the differences between a sampling distribution and a bootstrap distribution?

  2. Suppose we created confidence intervals based on distinct samples of size \(n=10\) and \(n=100\). How might they differ?

Confidence Intervals

  • A confidence interval gives a range of plausible values for a parameter. It usually takes the form:

\[ \textrm{Statistic }\pm \textrm{ Margin of Error (ME)} \]

  • The Margin of Error (ME) is partially determined by our sample size:
    • Bigger samples yield smaller ME and interval (more data \(\implies\) more certainty)
    • Smaller samples yield wider ME and interval (less data \(\implies\) less certainty)


  • Every confidence interval has a confidence level:
    • the percentage of samples that would yield a corresponding confidence interval that contains the true value of the parameter.


  • For example, were we to:
    • Repeatedly draw samples from the population
    • Create 95% confidence intervals in each sample
    • 95% of those confidence intervals would contain the true parameter

Sampling Distributions can determine the Margin of Error

  • Reminders:
    • Sampling distribution is often bell-shaped, with the mean equal to the parameter
    • In bell-shaped (normal) distributions, 95% of observations lie in the range:

\[ \text{Mean} \pm 2*\text{Standard Error} \]

Sampling Distributions can determine the Margin of Error

Implication: in the sampling distribution, 95% of sample statistics lie in the range: \[ \text{Parameter} \pm 2*\text{Standard Error} \] where the Standard Error is the standard deviation of the sampling distribution

  • So, 95% of sample statistics are within \(2*\mathrm{SE}\) from the parameter!

  • Thus, 95% confidence intervals usually take the form:

\[ \textrm{Statistic }\pm 2*\textrm{Standard Error (SE)} \]

  • We typically estimate the SE (i) with the bootstrap, or (ii) with a formula

Bootstrapping Confidence Intervals

Example: Reproduction Rate for Covid-19

  • Researchers are interested in the COVID-19 reproduction rate (the average number of individuals each infected person further infects)

  • Sample 50 infected individuals and perform contract tracing.

  infected  n
1        0  5
2        1 13
3        2 14
4        3 12
5        4  5
6        6  1
  mean_infected
1          2.06

  • Goal: Create an interval of plausible values for the reproduction rate.

  • Q: What is the population? What is the parameter?

  • Q: What is the sample? What is the statistic?

Bootstrap Reproduction Rate

We can use our sample to create a 95% confidence interval. What is each step doing, and why?

  1. Step 1:
set.seed(121)
bootstrap_samples <- covid %>% rep_sample_n(size = 50, replace = TRUE, reps = 5000)
  1. Step 2:
bootstrap_stats <- bootstrap_samples %>% group_by(replicate) %>% summarize(x_bar = mean(infected))
  1. Step 3:

  1. Step 4:
bootstrap_stats %>% summarize(SE = sd(x_bar))
# A tibble: 1 × 1
     SE
  <dbl>
1 0.181
  1. Step 5:

\[ \bar x \pm 2 \cdot SE \implies 2.06 \pm 2 \cdot 0.181 \]

Bootstrap Reproduction Rate

We can use our sample to create a 95% confidence interval.

  1. Create the bootstrap samples:
set.seed(121)
bootstrap_samples <- covid %>% rep_sample_n(size = 50, replace = TRUE, reps = 5000)
  1. Compute bootstrap statistics within each bootstrap sample:
bootstrap_stats <- bootstrap_samples %>% group_by(replicate) %>% summarize(x_bar = mean(infected))
  1. Graph the bootstrap distribution to check shape:

  1. Estimate the standard error
bootstrap_stats %>% summarize(SE = sd(x_bar))
# A tibble: 1 × 1
     SE
  <dbl>
1 0.181
  1. Because the bootstrap distribution is bell-shaped, we use \(2 \times\) the estimated SE as our margin of error to create a 95% confidence interval

\[ \bar x \pm 2 \cdot SE \implies 2.06 \pm 2 \cdot 0.181 \]

Generalized Confidence Intervals

  • In the previous example, we used our knowledge that for approximately bell-shaped sampling distributions, 95% of sample statistics are within 2 SE of the population parameter
    • Suppose we instead want a different success rate for our estimation method
    • Or suppose we want interval estimates for sampling distributions that are NOT bell-shaped
  • We can make these modifications again using the bootstrap approximation to the sampling distribution and make:

General Confidence Intervals

The \(C\%\) confidence interval for a parameter is an interval estimate that is computed from sample data by a method that captures the parameter for \(C\%\) of all samples.

Review: Percentiles and Quantiles

  • For a number \(k\) between \(0\) and \(100\), the \(k\)th percentile of a distribution is the value so that \(k\%\) of the data is less than or equal to that value.
    • The median is the 50th percentile of a distribution
    • 1st/3rd quartiles (Q1/Q3) are the 25th and 75th percentiles, respectively.

  • For a number \(p\) between \(0\) and \(1\), the \(p\) quantile of a distribution is the value so that a proportion \(p\) of the data is less than or equal to that value.
    • The median is the \(0.5\) quantile of a distribution
    • 1st/3rd quartiles (Q1/Q3) are the \(0.25\) and \(0.75\) quantiles, respectively.

Quantiles and Percentiles

By definition, 2.5% of the data is less than the .025 quantile, and 2.5% of the data is greater than the .975 quantile

  • This means that 95% of the data is between the .025 and the .975 quantiles.

Quantiles and Percentiles

By definition, 2.5% of the data is less than the .025 quantile, and 2.5% of the data is greater than the .975 quantile

  • This means that 95% of the data is between the .025 and the .975 quantiles
  • For sampling distributions that are bell-shaped, the .025 quantile is about \(2\cdot SE\) below the mean, and the .975 quantile is about \(2\cdot SE\) above the mean

  • So using the .025 and .975 quantiles is roughly equivalent to forming a 95% CI as: \(\text{Statistic} \pm 2*\text{SE}\)!

95% Confidence Interval: 2 ways

  1. \(\color{red}{\text{Statistic} \pm 2*\text{SE}}\)
    • \(\mathrm{Statistic} \ (\bar{x}) = 2.06\)
    • \(\mathrm{SE} = 0.18\)
    • \(95\% \ \mathrm{CI} = 1.70 \ \text{to} \ 2.42\)
  1. \(\color{red}{\text{Percentile Method}}\)

quantile(bootstrap_stats$x_bar, c(0.025, 0.975))
 2.5% 97.5% 
 1.72  2.40 

Percentile Method for Confidence Intervals

  • Percentile Method: For a \(C\%\) Confidence Interval, report the quantiles of the bootstrap distribution such that:
    • \(C\%\) lies in the middle
    • \(\frac{(100-C)}{2}\%\) lies on either end (i.e., “the rest” is evenly distributed on the ends)
  • Ex: For \(C\% = 95\%\) (i.e., a 95% Confidence Interval), we want
    • 95% in the middle
    • \(\frac{(100-C)}{2}\% = \frac{(100-95)}{2}\% = {2.5\%}\) on either end
  • Report the 0.025 quantile (or 2.5th percentile) and the 0.975 quantile (or 97.5th percentile)!

The Percentile Method: Example

  • Suppose we want to construct a 90% confidence interval for the reproduction rate
    • Find the .05 and .95 quantiles in the bootstrap distribution.
    • 90% of bootstrap sample statistics will be between these values

  • We can use the quantile function in R to calculate the .05 and .95 quantiles
quantile(bootstrap_stats$x_bar, c(.05, .95))
  5%  95% 
1.76 2.36 

The Percentile Method: Example

  • Suppose we want to construct a 90% confidence interval for the reproduction rate
    • Find the .05 and .95 quantiles in the bootstrap distribution.
    • 90% of bootstrap sample statistics will be between these values

  • Our 90% confidence interval is therefore 1.76 to 2.36
quantile(bootstrap_stats$x_bar, c(.05, .95))
  5%  95% 
1.76 2.36 

Percentile Method: Practice

With neighbor(s), name the quantiles of the bootstrap distribution you would need for:

  1. 80% Confidence Interval

  2. 99% Confidence Interval

  3. 2% Confidence Interval (You would never do this! This is just for fun (: )

Answers:

  1. 0.10 and 0.90 quantiles

  2. 0.005 and 0.995 quantiles

  3. 0.49 and 0.51 quantiles

Width of Confidence Intervals

Two factors determine the width of a confidence interval:

  1. Sample Size
    • The Standard Error of the sampling distribution decreases as sample size increases.
    • Smaller sample size \(\implies\) larger interval
    • Larger sample size \(\implies\) smaller interval
  2. Confidence Level
    • Decreasing the confidence level brings the relevant quantiles closer to the middle, decreasing the width of the interval.
    • Higher confidence level \(\implies\) larger interval
    • Lower confidence level \(\implies\) smaller interval

Discuss with Neighbor(s): Confidence Intervals get smaller with:

  • a larger sample size

  • a lower confidence level

Intuitively, why does this make sense?

Width of Confidence Intervals

Confidence Intervals get smaller with:

  • a larger sample size

  • a lower confidence level

Note: These reasons for getting smaller are competing in terms of certainty!

  • With a larger sample size, the interval gets smaller because we’re more certain the statistic is close to the parameter.
  • With a lower confidence level, we become less certain the interval will contain the true parameter.

Reminder: While a lower confidence level gives you a smaller interval, there is a cost! (i.e., lower success rate)

Confidence Interval Misunderstandings

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

  1. A 95% confidence interval does not contain 95% of observations in the population.

Misunderstanding 1

Suppose we wish to estimate the number of hours a Reed student sleeps on a typical night. We obtain the following 95% confidence interval: \((7.86, 8.34)\)

  1. A 95% confidence interval does not contain 95% of observations in the population.

  • Saying that 95% of all Reed students sleep between 7.86 and 8.34 hours should just feel wrong. That’s a pretty narrow interval!

Misunderstanding 2

  1. A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

Misunderstanding 2

  1. A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

  • Q: Why do the sampling distribution and bootstrap distribution look different?

Misunderstanding 2

  1. A 95% confidence interval does not mean that 95% of all sample means fall within the given range.

  • Q: Why do the sampling distribution and bootstrap distribution look different?

Misunderstanding 3

  1. Given a 95% confidence interval, Do Not Say: “There is a 95% chance that the true parameter falls within my interval.”
  • Once we take a sample and calculate a confidence interval, there’s no more randomness!

    • The interval either does or doesn’t contain the (unknown) parameter.
  • This is may seem like arguing over semantics – but it’s an important distinction!

Instead, say either:

  • “If we were to take many samples and calculate a 95% confidence interval for each, then 95% of them would contain the true parameter”

  • “We are 95% confident that the true parameter is in our confidence interval”

Next time

  • Hypothesis testing!


With any time left…

  • Midterm review questions?