Remaining Useful Life Prediction with scikit-learn

Predict when a turbofan engine will fail before it does.

Undergraduate Predictive Maintenance 3–5 weeks

Last reviewed: March 2026

Overview

Turbofan engines in commercial aviation are instrumented with dozens of sensors that continuously report health indicators such as fan speed, exhaust gas temperature, and fuel flow. The challenge for maintenance engineers is translating this high-dimensional time-series data into an actionable answer: how many flight cycles remain before this engine needs to be pulled from service? Remaining Useful Life (RUL) prediction is the core task of predictive maintenance, and it sits at the intersection of signal processing, statistics, and machine learning.

This project uses the NASA C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) dataset, the standard benchmark for engine prognostics research. You will build a scikit-learn pipeline that ingests raw multi-sensor readings, constructs informative features (rolling statistics, degradation indices, sensor differences), and trains regression models ranging from Ridge regression to Gradient Boosting. The focus is on the full engineering workflow: exploratory analysis, feature selection, cross-validation on run-to-failure trajectories, and calibrated uncertainty estimates.

By the end of the project you will have a reproducible ML pipeline that achieves competitive RMSE on the C-MAPSS test set, a feature importance analysis explaining which sensors drive predictions, and a report linking your approach to real-world maintenance decision thresholds. These skills translate directly to roles in airline MRO operations, engine OEM health monitoring teams, and defense sustainment programs.

What You'll Learn

✓ Preprocess and align multi-variate time-series sensor data for supervised regression
✓ Engineer degradation-aware features including rolling windows, exponential smoothing, and health indices
✓ Apply and tune scikit-learn regressors (Ridge, Random Forest, Gradient Boosting) with proper cross-validation
✓ Evaluate prognostic models using RUL RMSE and the asymmetric NASA scoring function
✓ Interpret feature importances and connect sensor physics to model behavior

Step-by-Step Guide

Load and explore the C-MAPSS dataset

Download all four C-MAPSS sub-datasets (FD001–FD004) from the NASA Prognostics Data Repository. Load them into pandas DataFrames and perform exploratory analysis: plot sensor trajectories for sample engines, compute correlations, and identify sensors with near-zero variance or monotone degradation trends.

Construct the RUL target variable

Because C-MAPSS training data runs each engine to failure, you can compute a piece-wise linear RUL target: cap it at a maximum (typically 125 cycles) to model the early healthy period as flat, then decrease linearly to zero. Experiment with the cap value and document how it affects model bias at early life.

Engineer features and build a preprocessing pipeline

Use scikit-learn Pipeline and ColumnTransformer to chain normalization, rolling-window statistics (mean, std over the last 30 cycles), and polynomial interaction terms. Use SelectKBest or mutual information to prune low-value sensors before model training.

Train and tune regression models

Fit Ridge regression, Random Forest, and HistGradientBoosting regressors. Use GroupKFold (grouped by engine ID) to avoid data leakage across training folds. Tune hyperparameters with RandomizedSearchCV and record validation RMSE for each configuration.

Evaluate on the test set with the NASA scoring function

The C-MAPSS test set provides the last observed cycle per engine; ground-truth RUL is provided separately. Predict on test windows and compute both RMSE and the asymmetric scoring function (which penalises late predictions more heavily than early ones). Visualise predicted vs. actual RUL scatter plots.

Analyse uncertainty and write a maintenance decision memo

Use RandomForestRegressor prediction variance or quantile regression to attach uncertainty bounds to each RUL estimate. Write a one-page memo explaining how a maintenance planner would set an alert threshold given those bounds, balancing unscheduled removal costs against premature maintenance.

Career Connection

See how this project connects to real aerospace careers.

Go Further

Stack the individual regressors into a weighted ensemble and show whether the ensemble outperforms the best single model.
Add a conformal prediction wrapper to produce statistically guaranteed RUL intervals at a chosen coverage level.
Re-implement the pipeline using the Dask-ML API so it can process a dataset 10× larger than RAM.
Compare your results against published LSTM baselines and write a structured comparison report.

Related Projects

Undergraduate Deep Learning for Engine Prognostics Teach an LSTM to hear an engine degrading cycle by cycle. View Project → Undergraduate Kaggle Turbofan Competition Pipeline Go beyond homework: build a competition-grade predictive maintenance pipeline. View Project →

← Back to All Projects More Undergraduate → Undergraduate Projects