Calculate and Compare Carbon Footprints of Different Flights

Turn flight data into climate insight with Python

High School Sustainability 2–3 weeks
Pythonpandasmatplotlib
Last reviewed: March 2026

Overview

Aviation contributes roughly 2–3% of global CO2 emissions, but not all flights are created equal. A short-haul regional jet burns far more fuel per passenger-kilometer than a fully loaded wide-body on a long-haul route. In this project, you'll use real public data from the Bureau of Transportation Statistics (BTS) to quantify these differences and build a simple model that predicts a flight's carbon footprint.

You'll work with the BTS T-100 Domestic Segment dataset, which reports fuel consumption, passengers, and distance for every domestic airline route in the United States. Using pandas for data wrangling and matplotlib for visualization, you'll calculate CO2 emissions per passenger-km for dozens of aircraft types — from regional turboprops to Boeing 777s — and discover which factors matter most.

Finally, you'll train a simple linear regression model that predicts CO2 per passenger-km from route distance and aircraft seat capacity. This is a real application of data science to one of the most pressing challenges in aerospace: making aviation sustainable.

What You'll Learn

  • Download and clean real aviation data from government databases
  • Calculate CO2 emissions using standard fuel-burn-to-CO2 conversion factors
  • Create informative visualizations comparing aircraft efficiency
  • Build and evaluate a simple linear regression model
  • Understand the relationship between aircraft size, route length, and carbon intensity

Step-by-Step Guide

1

Download BTS Flight Data

Navigate to the Bureau of Transportation Statistics (transtats.bts.gov) and download the T-100 Domestic Segment dataset for a recent year. Select columns including: carrier, origin, destination, aircraft type, departures, seats, passengers, distance, and — critically — fuel consumed (gallons).

Save the data as a CSV file. You'll have roughly 250,000 rows covering every domestic airline route. This is the same data airlines report to the federal government every quarter.

2

Clean and Explore the Data

Load the CSV into a pandas DataFrame. Filter out rows with zero passengers or zero fuel consumption (ferry flights, repositioning moves). Create new columns: fuel_per_pax (gallons per passenger) and co2_per_pax_km using the standard conversion factor of 21.1 pounds of CO2 per gallon of jet fuel (Jet-A).

Use df.describe() and df.groupby('AIRCRAFT_TYPE').mean() to get a feel for the data. Which aircraft types appear most often? What's the range of distances?

3

Visualize Emissions by Aircraft Type

Use matplotlib to create a horizontal bar chart of average CO2 per passenger-km for the 15 most common aircraft types. Color-code by aircraft category (regional, narrow-body, wide-body). You'll see a clear pattern: larger aircraft on longer routes are significantly more fuel-efficient per passenger.

Create a second plot: a scatter chart of route distance vs. CO2 per passenger-km, with each dot colored by aircraft size category. This reveals the "short-haul penalty" — short flights burn disproportionately more fuel per km because takeoff and climb dominate.

4

Calculate Route-Level Comparisons

Pick 3–4 popular city pairs (e.g., LAX–SFO, JFK–ORD, ATL–DFW) and compare the carbon footprint across airlines operating those routes. Do airlines using newer aircraft (A320neo vs. older 737s) show measurably lower emissions? Create a grouped bar chart showing the comparison.

Calculate the total annual CO2 for each route by multiplying per-flight emissions by departure frequency. Which routes contribute the most total emissions? Are they long-haul high-frequency routes or short-haul shuttles?

5

Build a Regression Model

Use scikit-learn's LinearRegression to predict CO2 per passenger-km from two features: route distance and aircraft seat capacity. Split your data 80/20 into training and test sets. Fit the model and check the R-squared score — how well do just two variables explain emission intensity?

Plot the predicted vs. actual values. Where does the model fail? Flights with very low load factors (few passengers relative to capacity) are hard to predict because the model doesn't know how full the plane is.

6

Add Features and Improve

Add more features to your model: load factor (passengers / seats), number of stops, and carrier (as a one-hot encoded variable). Retrain and compare R-squared. Load factor should dramatically improve predictions because a half-empty plane has roughly double the per-passenger emissions.

Try a polynomial regression (degree 2) to capture the non-linear relationship between distance and efficiency. Does it improve the fit?

7

Create a Summary Report

Compile your findings into a clear report with 4–5 key visualizations. Answer the questions: Which aircraft types are cleanest? How much does route length matter? How much does load factor matter? What's the single biggest lever for reducing aviation's carbon footprint?

Include your model's performance metrics and discuss its limitations. A strong conclusion might note that the biggest gains come from filling seats and replacing older aircraft — not from choosing different routes.

Go Further

  • Compare with rail and road — add CO2 data for train and car travel on the same city pairs to see where flying is (and isn't) the worst option
  • Model SAF impact — estimate how switching to sustainable aviation fuel (SAF) blends would change the numbers
  • Build an interactive dashboard — use Plotly or Streamlit to let users pick a route and see the emissions comparison live
  • Track trends over time — download data from multiple years and plot how fleet-wide emissions intensity is changing