Contrail Prediction and Avoidance with ML

Predict where contrails form and reroute flights to avoid them

Advanced Sustainability 6–8 weeks
Last reviewed: March 2026

Overview

Contrails — the white lines left by aircraft in cold, humid air — may cause as much warming as all aviation CO2 combined. When contrails persist and spread into cirrus clouds, they trap outgoing longwave radiation, creating a net warming effect. The key insight is that contrails only form under specific atmospheric conditions: ice-supersaturated regions (ISSRs) where temperature is below about -40 degC and relative humidity with respect to ice exceeds 100%. If flights could avoid these narrow altitude bands — often just 2,000 feet of rerouting — the climate impact would be dramatically reduced.

In this project, you'll build a machine learning model that predicts contrail formation probability from atmospheric reanalysis data (ERA5 from ECMWF) and optionally validate against satellite contrail detections from the GOES-16 satellite or Google's open contrail dataset. You'll use xarray to handle multi-dimensional atmospheric data (latitude, longitude, altitude, time) and PyTorch to train a convolutional model that outperforms simple humidity thresholds.

This is an active area of research with real impact: Google, Breakthrough Energy, and American Airlines have run operational contrail avoidance trials, and the models they use are direct descendants of the approach you'll develop here. The challenge lies in the atmospheric data's complexity — ice supersaturation is poorly resolved in weather models and varies on scales smaller than the grid spacing.

What You'll Learn

  • Process multi-dimensional atmospheric reanalysis data using xarray and NetCDF
  • Understand the Schmidt-Appleman criterion for contrail formation
  • Train a convolutional neural network on gridded atmospheric fields
  • Evaluate probabilistic predictions using Brier score and reliability diagrams
  • Estimate the climate impact trade-off between contrail avoidance and extra fuel burn from rerouting

Step-by-Step Guide

1

Acquire ERA5 Atmospheric Data

Register with the Copernicus Climate Data Store (CDS) and use the CDS API to download ERA5 pressure-level data for a region and time period of your choice (start with a single month over the North Atlantic or continental US). Request: temperature, relative humidity, specific humidity, u/v wind components, and geopotential at pressure levels from 150 to 350 hPa (the cruise altitude band, roughly 30,000–40,000 feet).

Also download surface-level data (solar radiation, outgoing longwave radiation) as these affect whether contrails have a warming or cooling effect. The data arrives in NetCDF format — xarray handles this natively with xr.open_dataset().

2

Compute the Schmidt-Appleman Criterion

Implement the Schmidt-Appleman criterion (SAC) — the thermodynamic threshold for contrail formation. This requires calculating whether the mixing line between the engine exhaust plume and ambient air crosses the liquid saturation curve on a temperature-water content diagram. The key parameters are ambient temperature, ambient humidity, fuel-specific energy, and the overall propulsive efficiency of the engine.

Apply the SAC to every grid point in your ERA5 data to create a binary "contrail possible" field. Then compute relative humidity with respect to ice (RHi) from ERA5's temperature and specific humidity to determine where contrails would persist (RHi > 100%). This physics-based prediction is your baseline.

3

Obtain Contrail Labels

For supervised learning, you need ground truth. Use Google's Open Contrails dataset (released on Kaggle), which contains GOES-16 satellite images labeled with human-annotated contrail masks. Alternatively, use the CoCiP (Contrail Cirrus Prediction) model outputs available from Breakthrough Energy as a softer label source.

Align the contrail observations with your ERA5 data spatially and temporally. This registration step is tricky — satellite pixels don't align perfectly with reanalysis grid cells, so you'll need to interpolate ERA5 data to satellite observation locations and times using xarray's .interp() method.

4

Design and Train the CNN

Build a U-Net or lightweight CNN in PyTorch that takes multi-channel atmospheric fields (temperature, RHi, wind shear — stacked as channels at each pressure level) and predicts a contrail probability map. The architecture should capture spatial context because contrail-favorable regions are spatially coherent — a 5x5 patch of humid cold air is more likely to produce contrails than an isolated grid cell.

Use binary cross-entropy loss for training since this is fundamentally a pixel-wise classification problem. Apply class weighting or focal loss to handle the severe class imbalance — contrails cover only a small fraction of the sky at any given time. Train with the Adam optimizer, starting with a learning rate of 1e-4, and monitor validation Brier score.

5

Evaluate Against the Physics Baseline

Compare your ML model against the SAC baseline using precision, recall, Brier score, and reliability diagrams. The SAC baseline will have high recall (it identifies most contrail-possible regions) but low precision (many SAC-positive regions don't actually produce visible contrails because of sub-grid variability). Your ML model should improve precision by learning patterns the threshold criterion misses.

Create spatial maps showing where the two approaches disagree. Does the ML model learn that contrails are less likely near the boundaries of ISSRs? Does it capture the diurnal cycle (nighttime contrails have different climate impact than daytime ones)?

6

Simulate Contrail Avoidance Routing

Using your contrail probability maps, simulate a simple avoidance strategy: for a set of flight trajectories (from OpenSky Network ADS-B data), check whether each flight segment passes through a high-contrail-probability region. If so, compute the altitude change needed (typically a shift of 2,000–4,000 feet) and estimate the extra fuel burn from flying at a non-optimal altitude.

Calculate the trade-off: the additional CO2 from extra fuel vs. the warming avoided by preventing contrails. Research suggests the climate benefit of contrail avoidance outweighs the fuel penalty by a factor of 10–100x, but this depends heavily on the accuracy of the prediction model — false positives mean wasted fuel with no climate benefit.

7

Analyze Uncertainty and Write Up

Quantify the uncertainty in your predictions using Monte Carlo dropout or an ensemble of models trained on different data splits. Map the uncertainty — is it highest at the edges of ISSRs (as expected) or in regions where ERA5 resolution is insufficient? Discuss the operational implications: an airline would only reroute if the contrail probability exceeds some threshold — what threshold balances climate benefit against fuel cost?

Write a research-style report with an abstract, methodology, results, and discussion. Include a section on limitations: ERA5's humidity biases in the upper troposphere, the satellite detection limitations (thin contrails are invisible to GOES), and the simplifying assumptions in your routing simulation.

Go Further

  • Use the full CoCiP model — implement Breakthrough Energy's open-source Contrail Cirrus Prediction model and compare its outputs to your ML approach
  • Add radiative forcing — compute the actual warming effect (W/m2) of predicted contrails using longwave and shortwave radiation data
  • Multi-objective optimization — frame contrail avoidance as a Pareto optimization between fuel cost, CO2, and non-CO2 warming
  • Temporal models — use recurrent architectures to predict contrail evolution over time, not just formation probability