Data Visualization

Megan Ayers

Math 141 | Spring 2026
Wednesday, Week 1

Announcements/reminders

Waitlist movement - let me know if you’re still on the waitlist and haven’t heard from me
Readings posted Fridays for the following week, slides right after lecture

Last Time

Introductions
Statistical thinking
Introduced data frames

Goals for Today

Review data frames
Motivate data visualizations
Develop language to talk about the components of a graphic
Practice deconstructing graphics
Discuss good graphical practices

Data Frames

Data in spreadsheet-like format where:

Rows = Observations/cases
Columns = Variables

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Data from GPT Detectors Are Biased Against Non-Native English Writers. Weixin Liang, Mert Yuksekgonul, Yining Mao, Eric Wu, James Zou. CellPress Patterns and available in the R package detectors.

Data Frames

ID	kind	.pred_AI	.pred_class	detector	native	name	model
1	Human	0.9999942	AI	Sapling	No	Real TOEFL	Human
2	Human	0.8281448	AI	Crossplag	No	Real TOEFL	Human
3	Human	0.0002137	Human	Crossplag	Yes	Real College Essays	Human
4	AI	0.0000000	Human	ZeroGPT	NA	Fake CS224N - GPT3	GPT3
5	AI	0.0017841	Human	OriginalityAI	NA	Fake CS224N - GPT3, PE	GPT4
6	Human	0.0001783	Human	HFOpenAI	Yes	Real CS224N	Human

Columns = Variables

Variables: Describe characteristics of the observations

Quantitative: Numerical in nature
Categorical: Values are categories
Identification: Uniquely identify each case

Types of variables

Identification: Uniquely identify each case

Quantitative: Numerical in nature
- Those that can take a range of values are called continuous (e.g., age, income)
- Those that only take particular (often whole number) values are called discrete (e.g., result of a dice roll)
- Not every variable involving numbers is quantitative!

Categorical: Values represent categories
- Usually take non-numeric values (e.g., Race/Ethnicity or Gender)
- But can take numeric values! (e.g., zip code)
- The values that a categorical variable can take are called its levels
- Ordinal categorical variables can be ordered (e.g., level of education, variables collected with a Likert scale)
- Categorical variables that cannot be ordered are called nominal (e.g., Race/Ethnicity)

Why construct a graph?

To explore the data.

To summarize the data.

To showcase trends and make comparisons.

To tell a compelling story.

Doing any of this by only looking at a data frame (even a small one) would be hard!

Challenger

On January 27th, 1986, engineers from Morton Thiokol recommended NASA delay launch of space shuttle Challenger due to cold weather.
- Believed cold weather impacted the o-rings that held the rockets together.
- Used 13 charts in their argument.
After a two hour conference call, the engineer’s recommendation was overruled due to lack of persuasive evidence and the launch proceeded.
The Challenger exploded 73 seconds into launch.

Challenger

Here’s one of those charts.

Challenger

Here’s another one of those charts.

Challenger

Here’s a graphic created in R from Statistician Edward Tufte’s data.

Challenger

This adaptation is a recreation of Edward Tufte’s graphic.

Now let’s learn the Grammar of Graphics.

We will use this grammar to:

Decompose and understand existing graphs.

Create our own graphs with the R package ggplot2.

Grammar of Graphics

data: Data frame that contains the raw data
- Columns are variables used in the graph
geom: Geometric form that the data are mapped to.
- EX: Point, line, bar, text, …
aesthetic: Visual properties of the geom
- EX: X (horizontal) position, y (vertical) position, color, fill, shape
scale: Controls how data are mapped to the visual values of the aesthetic.
- EX: particular colors, log scale
guide: Legend/key to help user convert visual display back to the data

For right now, we won’t focus on the names of particular types of graphs (e.g., scatterplot) but on the elements of graphs.

What are the variables?
What geom (i.e. shape, form) are the variables mapped to?
What are the aesthetics (visual properties) of the geom?
How is each variable mapped to an aesthetic?
What additional context is provided? Is any missing?
What story is the graph telling?

What are the variables?
What geom are the variables mapped to?
What are the aesthetics of the geom?
How is each variable mapped to an aesthetic?
What additional context is provided? Is any missing?
What story is the graph telling?

Visualization Considerations

What additional context should my graphs have?

For context, at a minimum include
- Axis labels (with units reported).
- Legends.
- Data source.
Think about the stories/questions your visualization answers.
Determine what context/background information your viewer needs.
Visualizing data involves editorial choices.
- What to highlight.
- What comparisons to make easy to see.
- What scales to use.

Context Example

Water Lilies

What visual cues are easier to compare?

What to consider with color?

Consider color blindness (click here for a helpful tool for picking colors).

Color Palettes – Sequential

Maps, like the Dude map are also a great way to provide context!

Color Palettes – Diverging

Adam Pearce’s 2015 NBA Games

Color Palettes – Qualitative

information is beautiful’s Best in Show

Bad Graphics

Because of all the design choices, it is much easier to make a bad graph than a good graph.

Misleading Graphics

Be careful that your design choices don’t cause your viewer to draw incorrect conclusions about the data:

Just letting the software make all the design choices can still lead to misleading graphs (recall the Georgia COVID graph).

Summary Thoughts on Graphical Considerations

Good graphics are one’s where the findings and insights are obvious to the viewer.
- Add information and key context.
Facilitate the comparisons that correspond to the research question.
- Recall the three Georgia COVID counts graphs from Day 1!
Data visualizations are not neutral.
It is easier to see the differences and similarities between different types of graphics if we learn the grammar of graphics.
Practicing decomposing graphics should make it easier for us to compose our own graphics.

Next time

Tomorrow in lab: intro to using RStudio to code and explore data frames
Friday: We’ll learn about the ggplot2 package so that we can use the grammar of graphics to create beautiful visualizations!

Announcements/reminders

Last Time

Goals for Today

Data Frames

Data Frames

Types of variables

Why construct a graph?

Challenger

Challenger

Challenger

Challenger

Challenger

Now let’s learn the Grammar of Graphics.

Grammar of Graphics

Example 1: Think-pair-share

Example 2: Think-pair-share

Visualization Considerations

What additional context should my graphs have?

Context Example

Water Lilies

What visual cues are easier to compare?

What to consider with color?

Color Palettes – Sequential

Color Palettes – Diverging

Color Palettes – Qualitative

Bad Graphics

Misleading Graphics

Summary Thoughts on Graphical Considerations

Next time