Predict Jet Engine Thrust from Sensor Data with Random Forest

Use NASA engine data to learn what sensor readings reveal about thrust

Undergraduate Propulsion 4–6 weeks
Last reviewed: March 2026

Overview

A modern turbofan engine is instrumented with dozens of sensors that measure temperatures, pressures, spool speeds, and flow rates at stations throughout the gas path. These measurements define the engine's operating state and contain all the information needed to infer its thrust output — even though thrust itself is notoriously difficult to measure directly in flight. Airlines and engine manufacturers use sensor-based models to estimate real-time thrust for performance monitoring, fuel burn optimization, and health diagnostics.

NASA's Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset is one of the most widely used benchmarks in prognostics and engine health management. It provides simulated sensor readings from a fleet of turbofan engines operating under varying conditions and degradation levels. In this project you will use C-MAPSS data to train a Random Forest regression model that predicts a thrust-related performance parameter from the sensor suite.

Beyond building the model, you will explore feature importance — discovering which sensors carry the most information about thrust. This analysis connects directly to gas turbine thermodynamics: you'll find that high-pressure turbine exit temperature and fan speed dominate, matching what Brayton cycle theory predicts. The project bridges propulsion theory with practical data science.

What You'll Learn

  • Load, clean, and explore the NASA C-MAPSS turbofan engine dataset
  • Understand the relationship between gas path sensor measurements and engine thrust
  • Train a Random Forest regression model with hyperparameter tuning via cross-validation
  • Analyze feature importances and connect them to gas turbine thermodynamic principles
  • Evaluate model accuracy across different operating conditions and degradation levels

Step-by-Step Guide

1

Download and Understand the C-MAPSS Dataset

Download the C-MAPSS dataset from the NASA Prognostics Data Repository. Start with the FD001 subset — it contains data from 100 engines operating under a single flight condition with one fault mode (HPC degradation). Each engine runs from healthy to failure, with 21 sensor channels recorded at each cycle.

Read the dataset documentation carefully. The 21 sensors include: fan inlet temperature (T2), LPC outlet temperature (T24), HPC outlet temperature (T30), LPT outlet temperature (T50), fan speed (Nf), core speed (Nc), and several pressure measurements. Three operational setting columns define the flight condition. Understand what each sensor measures and where it sits in the gas path.

2

Explore and Visualize Sensor Trends

Pick 3–4 engines and plot each sensor reading vs. operating cycle. Some sensors trend clearly as the engine degrades (T30 and T50 increase, efficiency-related measurements decrease). Others are nearly flat and carry little information. Use correlation analysis to identify which sensors are most correlated with each other and with the operating settings.

Since C-MAPSS doesn't provide explicit thrust values, derive a thrust proxy: the fan speed corrected for inlet conditions (Nf_corrected = Nf / sqrt(T2/288.15)) is commonly used, or you can use the provided "fan efficiency" proxy. This corrected parameter is your regression target.

3

Prepare the Feature Matrix

Select the sensor columns and operational settings as your features. Drop any constant or near-constant sensors (some channels in FD001 have zero variance because the operating condition is fixed). Normalize all features using StandardScaler.

Add rolling window features — for each sensor, compute the mean and standard deviation over the previous 5 cycles. These rolling statistics capture short-term trends and often improve model accuracy because they encode the rate of change, not just the instantaneous value.

4

Train the Random Forest Model

Split the data by engine ID (not randomly!) — use engines 1–80 for training and 81–100 for testing. This prevents data leakage from the same engine appearing in both sets. Train a RandomForestRegressor with 200 trees:

from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(n_estimators=200, max_depth=15, random_state=42)
rf.fit(X_train, y_train)

Use 5-fold cross-validation on the training set to tune max_depth (try 10, 15, 20, None) and min_samples_leaf (try 1, 5, 10). Track RMSE and R² at each fold.

5

Analyze Feature Importance

Extract and plot the feature importances from the trained Random Forest. The top features should include core speed (Nc), HPC outlet temperature (T30), and fan speed (Nf) — these are the thermodynamic quantities most directly linked to thrust through the Brayton cycle.

Run a permutation importance analysis (scikit-learn's permutation_importance) to cross-check. Permutation importance is model-agnostic and avoids the bias that tree-based importance has toward high-cardinality features. Do the rankings agree? Discuss any differences in your report.

6

Evaluate Across Degradation Levels

Divide the test data into early-life (first 30% of cycles), mid-life, and late-life segments. Evaluate model accuracy separately on each segment. Does the model perform worse on degraded engines? If so, this suggests the relationship between sensor readings and thrust changes as the engine degrades — a key insight for engine health management.

Plot predicted vs. actual for the full test set, color-coded by engine life stage. A well-performing model will show tight clustering around the diagonal regardless of degradation level.

7

Document and Discuss

Write a technical report covering the dataset, feature engineering decisions, model selection and tuning, and results. Include the feature importance plot and the degradation-stage analysis. Discuss how this approach could be deployed in a real engine health monitoring system — what would be needed to go from this offline analysis to a real-time thrust estimation model running on an aircraft?

Compare your ML approach to the physics-based alternative: a full engine performance model (like GasTurb or NPSS) that computes thrust from first principles. When would you use each approach?

Go Further

  • Extend to the FD002–FD004 subsets of C-MAPSS, which include multiple operating conditions and fault modes, making the prediction problem significantly harder.
  • Replace the Random Forest with a gradient boosting model (XGBoost or LightGBM) and compare accuracy and training time.
  • Build a remaining useful life (RUL) prediction model on the same dataset — a natural extension that is one of the most-studied problems in prognostics.
  • Implement a real-time dashboard using Plotly Dash that displays sensor readings and the model's thrust estimate as you step through engine cycles.