Tidy Tuesdays
  • Home
  • Weekly Challenges
  • About

On this page

  • Introduction
  • The Data
  • Air Quality Over Time
  • Comparing Pollutant Levels
  • Air Quality Category Distribution
  • Correlation Between Pollutants
  • Key Findings
  • Conclusions
  • Sources and Further Reading

Week 2: Global Air Quality

time series
air quality
environmental
Analyzing air quality measurements across major global cities
Published

February 12, 2025

Introduction

This week’s Tidy Tuesday focused on global air quality data. Air pollution is a major environmental risk to health worldwide, with outdoor air pollution responsible for millions of deaths annually. Understanding air quality trends across major cities can help inform policy decisions and public health initiatives.

The Data

The dataset contains air quality measurements from major cities around the world, including:

  • Air Quality Index (AQI)
  • PM2.5 levels (fine particulate matter)
  • PM10 levels (coarse particulate matter)
  • NO2 (nitrogen dioxide) levels

Let’s take a look at the first few rows:

# Using kable for better table formatting
head(air_quality) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = FALSE)
city date aqi pm25 pm10 no2 quality_category
London 2024-01-01 178 70.11916 106.88484 48.54721 Unhealthy
New York 2024-01-01 198 107.92362 197.37991 44.10613 Unhealthy
Tokyo 2024-01-01 33 11.32796 21.75825 10.47382 Good
Paris 2024-01-01 189 93.51966 117.13194 38.54324 Unhealthy
Beijing 2024-01-01 69 23.44470 57.88696 31.75724 Moderate
London 2024-01-02 137 49.93188 99.80880 68.19650 Unhealthy for Sensitive Groups

Air Quality Over Time

Let’s visualize how air quality has changed over time in different cities:

ggplot(air_quality, aes(x = date, y = aqi, color = city)) +
  geom_line() +
  geom_smooth(method = "loess", se = FALSE, alpha = 0.5) +
  facet_wrap(~city, ncol = 2) +
  labs(
    title = "Air Quality Index Over Time",
    subtitle = "January - March 2024",
    x = "Date",
    y = "Air Quality Index (AQI)",
    color = "City"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.background = element_rect(fill = "lightblue", color = NA),
    strip.text = element_text(face = "bold")
  )

Comparing Pollutant Levels

Let’s compare the different pollutant levels across cities:

air_quality %>%
  group_by(city) %>%
  summarize(
    avg_aqi = mean(aqi),
    avg_pm25 = mean(pm25),
    avg_pm10 = mean(pm10),
    avg_no2 = mean(no2)
  ) %>%
  tidyr::pivot_longer(
    cols = starts_with("avg_"),
    names_to = "pollutant",
    values_to = "value"
  ) %>%
  mutate(
    pollutant = case_when(
      pollutant == "avg_aqi" ~ "Air Quality Index",
      pollutant == "avg_pm25" ~ "PM2.5",
      pollutant == "avg_pm10" ~ "PM10",
      pollutant == "avg_no2" ~ "NO2"
    )
  ) %>%
  ggplot(aes(x = reorder(city, value), y = value, fill = city)) +
  geom_col() +
  facet_wrap(~pollutant, scales = "free_y", ncol = 2) +
  coord_flip() +
  labs(
    title = "Average Pollutant Levels by City",
    subtitle = "January - March 2024",
    x = NULL,
    y = "Concentration"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    strip.background = element_rect(fill = "lightgray", color = NA),
    strip.text = element_text(face = "bold")
  )

Air Quality Category Distribution

Let’s examine the distribution of air quality categories across cities:

# Plot directly with the original styling
ggplot(air_quality, aes(x = city, y = aqi, fill = quality_category)) +
  geom_boxplot(alpha = 0.8) +
  scale_fill_manual(values = c(
    "Good" = "#91CF60",               # A softer green
    "Moderate" = "#FFFFBF",           # Pale yellow
    "Unhealthy for Sensitive Groups" = "#FEE08B", # Soft orange
    "Unhealthy" = "#FC8D59",          # Muted red-orange
    "Very Unhealthy" = "#D73027",     # Muted red
    "Hazardous" = "#A50026"           # Darker red
  )) +
  labs(
    title = "Distribution of Air Quality Index by City",
    x = "City",
    y = "Air Quality Index (AQI)",
    fill = "Air Quality Category"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Correlation Between Pollutants

Let’s explore the correlation between different pollutants:

air_quality %>%
  select(aqi, pm25, pm10, no2) %>%
  cor() %>%
  as.data.frame() %>%
  tibble::rownames_to_column("pollutant1") %>%
  tidyr::pivot_longer(-pollutant1, names_to = "pollutant2", values_to = "correlation") %>%
  ggplot(aes(x = pollutant1, y = pollutant2, fill = correlation)) +
  geom_tile() +
  geom_text(aes(label = round(correlation, 2)), color = "white", size = 4) +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0.5) +
  labs(
    title = "Correlation Between Pollutants",
    x = NULL,
    y = NULL,
    fill = "Correlation"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid = element_blank()
  )

Key Findings

From our analysis, we can observe:

  1. City Differences: Beijing shows the highest average AQI levels, followed by Tokyo and New York.
  2. Pollutant Relationships: PM10 and AQI show the strongest correlation, which is expected as PM10 is a major component of air quality calculations.
  3. Seasonal Patterns: All cities show some fluctuation in air quality over the period, with some improvement trends visible in specific cities.
  4. Quality Categories: London has the highest percentage of “Good” air quality days, while Beijing has the highest percentage of “Unhealthy” days.

Conclusions

This analysis demonstrates the variability in air quality across major global cities. The data shows that while some cities struggle with consistently poor air quality, others maintain relatively good air quality levels throughout the period. These findings highlight the need for continued monitoring and targeted interventions in cities with persistent air quality issues.

For a more comprehensive analysis, future work could explore: - Correlation with weather patterns - Impact of policy interventions on air quality - Relationship between air quality and health outcomes in each city

Sources and Further Reading

  • World Health Organization Air Quality Guidelines
  • Environmental Protection Agency Air Quality Index
  • Our World in Data - Air Pollution
 
  • © 2025 | Tidy Tuesdays Personal Project