Deep Learning for Engine Prognostics

Teach an LSTM to hear an engine degrading cycle by cycle.

Undergraduate Predictive Maintenance 4–6 weeks
Last reviewed: March 2026

Overview

Classical machine learning models treat each time step as an independent feature vector and discard the sequential structure of sensor data. Deep learning architectures — particularly Long Short-Term Memory (LSTM) networks and 1D Convolutional Neural Networks (CNNs) — are designed to exploit temporal patterns, making them natural fits for engine health monitoring where gradual degradation manifests as slow drift and oscillation changes across hundreds of flight cycles.

This project builds on the NASA C-MAPSS dataset used in introductory ML courses but pushes deeper: you will implement a windowed sequence loader that produces overlapping input sequences, design stacked LSTM and CNN-LSTM hybrid models in TensorFlow/Keras, and apply modern training techniques including cosine learning rate decay, dropout regularisation, and early stopping with model checkpointing. You will also implement a lightweight attention mechanism that highlights which time steps in the input window the model is focusing on, giving interpretable insight into the learned degradation signatures.

The skills developed here — sequence modelling, attention mechanisms, and model interpretability for safety-critical predictions — are directly applicable across aerospace health management, industrial IoT, and any domain where time-series prognosis matters. TensorFlow is the dominant framework in deployed aerospace ML systems, making proficiency here immediately relevant to industry roles.

What You'll Learn

  • Implement sliding-window sequence datasets for time-series regression in TensorFlow/Keras
  • Design, train, and regularise LSTM and 1D-CNN architectures for RUL prediction
  • Apply learning rate scheduling, early stopping, and model checkpointing for robust training
  • Implement a simple additive attention mechanism and visualise attention weight distributions
  • Compare deep learning RUL accuracy against scikit-learn baselines using RMSE and NASA scoring

Step-by-Step Guide

1

Prepare the C-MAPSS sequence dataset

Extend the C-MAPSS preprocessing from the scikit-learn project: normalise sensors per operating condition cluster, compute the piecewise-linear RUL target, and write a TensorFlow tf.data.Dataset pipeline that yields overlapping windows of length 30 cycles with the corresponding RUL label. Verify shapes and inspect sample batches before training.

2

Build and train a stacked LSTM baseline

Define a two-layer LSTM model in Keras with dropout between layers. Compile with the Adam optimiser and Huber loss (more robust to early-life RUL outliers than MSE). Add EarlyStopping on validation loss and ModelCheckpoint to save the best weights. Train on FD001 and report validation RMSE per epoch.

3

Design a 1D-CNN and a CNN-LSTM hybrid

Implement a 1D convolutional feature extractor (three Conv1D layers with increasing filters, followed by GlobalAveragePooling1D) and evaluate it as a standalone model. Then combine the CNN extractor with an LSTM decoder layer to create a CNN-LSTM hybrid. Compare all three architectures on validation RMSE and training speed.

4

Add an attention mechanism

Implement a Bahdanau-style additive attention layer that produces a weighted sum of LSTM hidden states. Train the attention-augmented model and extract the attention weight matrices for a set of test engines. Plot the attention heatmaps over the input window and correlate high-attention time steps with known degradation events in the sensor traces.

5

Tune hyperparameters with Keras Tuner

Use Keras Tuner with a Hyperband strategy to search over LSTM units (32–256), dropout rates (0.1–0.5), and window length (15–50). Run the search on the FD001 fold and retrain the best configuration from scratch with the full training set. Document how much tuning improved over the hand-designed baseline.

6

Generalise across operating conditions and report

Train and evaluate models on all four C-MAPSS sub-datasets (FD001–FD004), which vary in number of operating conditions and fault modes. Analyse why performance differs across sub-datasets. Write a technical report structured as an AIAA-style paper with an abstract, methodology, results table, attention visualisation figures, and a conclusion.

Go Further

  • Implement a Temporal Convolutional Network (TCN) and compare its convergence speed and accuracy against the LSTM and CNN-LSTM models.
  • Apply SHAP DeepExplainer to attribute RUL predictions to individual sensor inputs and validate the attributions against domain knowledge.
  • Export the best model to TensorFlow Lite and benchmark inference latency on a Raspberry Pi 4 to evaluate embedded deployment feasibility.
  • Implement Monte Carlo dropout at inference time to produce probabilistic RUL distributions and compare coverage against the conformal prediction baseline from the scikit-learn project.