Classify Aircraft Types from ADS-B Trajectories

Identify what's flying overhead from how it flies

Undergraduate Signal Processing 4–6 weeks
Last reviewed: March 2026

Overview

Every aircraft type has a distinctive flight signature. A Boeing 737 climbs at ~2,500 feet per minute and cruises at 450 knots; a Cessna 172 climbs at ~700 fpm and cruises at 110 knots; a military fighter pulls 4G turns that no airliner could match. If you can extract these kinematic features from an aircraft's trajectory data, you can identify what type of aircraft it is purely from how it moves — even without the aircraft explicitly reporting its type.

In this project, you'll use the OpenSky Network's historical ADS-B data to download thousands of flight trajectories, extract kinematic features (climb rate, cruise speed, turn rate, acceleration profiles), and train a scikit-learn classifier to predict aircraft category. You'll work with real data that contains all the challenges of production ML: missing values, noisy measurements, class imbalance, and ambiguous labels.

This is a genuine research problem with applications in air traffic management (verifying aircraft type for wake turbulence separation), defense (identifying unknown aircraft from radar tracks), and aviation analytics (understanding fleet composition at airports). The feature engineering skills you build here — extracting meaningful signals from noisy spatiotemporal data — are transferable to any trajectory analysis domain.

What You'll Learn

  • Query the OpenSky Network API for historical aircraft trajectory data at scale
  • Engineer kinematic features from noisy GPS/ADS-B trajectories (speed, climb rate, turn rate, acceleration)
  • Handle real-world data quality issues: missing values, outliers, inconsistent sampling rates
  • Train and compare multiple classifiers on a multi-class problem with imbalanced classes
  • Evaluate classification performance with confusion matrices and understand which aircraft types are most confusable

Step-by-Step Guide

1

Design the Data Collection Strategy

Decide on your aircraft categories. A good starting set: large jet (B737, A320 families), regional jet (CRJ, ERJ), turboprop (ATR, Dash-8), light piston (C172, PA28), and helicopter. Each category has distinctive flight characteristics that should be classifiable.

Use the OpenSky Network's Trino/Impala interface (free academic access) or their Python API to download trajectories. For each flight, you need the state vector time series: timestamp, latitude, longitude, altitude, velocity, heading, and vertical rate. Collect 100–200 flights per category from multiple airports to ensure diversity. Cross-reference ICAO24 addresses with the OpenSky aircraft database to get ground-truth aircraft types.

2

Clean and Segment Trajectories

ADS-B data is noisy: position accuracy is ~10 meters, but velocity and vertical rate can jump erratically due to multipath effects and receiver noise. Apply a Savitzky-Golay filter or moving average to smooth the trajectories before feature extraction.

Segment each flight into phases: climb (positive vertical rate > 200 fpm), cruise (vertical rate near zero), and descent (negative vertical rate > 200 fpm). Handle edge cases: flights with data gaps, short trajectories (aircraft only partially in coverage), and anomalous altitude jumps (common ADS-B artifacts). Drop trajectories with less than 10 minutes of data.

3

Engineer Trajectory Features

For each flight, compute a feature vector that captures its kinematic signature. Speed features: mean and max ground speed, cruise speed (median speed during cruise phase), speed at 10,000 feet (standard for all aircraft). Climb features: mean and max climb rate, time from takeoff to FL100 (10,000 feet). Turn features: mean absolute turn rate, max turn rate, number of significant turns (heading change > 30°).

Also compute acceleration features: standard deviation of speed (smooth cruise vs. variable speed), standard deviation of vertical rate (stable vs. oscillating climb). And geometric features: maximum altitude achieved, total distance flown, ratio of distance to great-circle distance (a measure of how direct the flight was). These ~15-20 features should capture the essential differences between aircraft categories.

4

Train and Compare Classifiers

Split data 70/15/15 (train/validation/test) with stratified sampling. Train multiple scikit-learn classifiers: Random Forest (robust baseline), Gradient Boosting (XGBoost via xgboost package), and SVM with an RBF kernel. Use StandardScaler to normalize features for the SVM.

Tune hyperparameters using GridSearchCV on the validation set: for Random Forest, tune n_estimators and max_depth; for XGBoost, tune learning_rate, max_depth, and n_estimators. Class imbalance (many airliners, few helicopters) should be addressed with class_weight='balanced' or SMOTE oversampling.

5

Analyze the Confusion Matrix

Generate the confusion matrix for your best model on the test set. Study the misclassification patterns: large jets vs. regional jets will likely be the hardest pair (similar speeds, the main difference is climb performance and altitude). Helicopters should be easy to separate (low speed, unique flight patterns). Turboprops may get confused with light piston aircraft if they fly similar routes.

For each confusion pair, investigate why the model struggles. Extract the feature distributions for confused classes and plot them: which features overlap? Are there additional features that could separate them? This diagnostic process is the most valuable skill in applied ML.

6

Feature Importance and Ablation Study

Use Random Forest feature_importances_ and permutation importance to rank features. Which kinematic variables are most discriminative? Cruise speed and climb rate should rank highly — these are the primary physical differences between aircraft categories.

Run an ablation study: remove features one at a time and measure the accuracy drop. This tells you which features are essential vs. redundant. Also try PCA or t-SNE to visualize the feature space in 2D — do the classes form separable clusters? Document the complete analysis in a format suitable for a conference paper or portfolio project.

Go Further

Extend your trajectory analysis skills:

  • Sequence models — instead of hand-crafted features, feed raw trajectory sequences into an LSTM or Transformer to learn features automatically
  • Anomaly detection — train on normal flights and flag trajectories that don't match any known aircraft type, potentially detecting unusual operations
  • Real-time classification — build a streaming classifier that identifies aircraft type from partial trajectories (first 5 minutes of flight only)
  • Airport-level fleet analysis — apply your classifier to a full day of traffic at a major airport and compare the predicted fleet mix against published statistics