Predict Pilot Fatigue Risk from Flight Schedule Data

Model the invisible threat to flight safety with ML

Advanced Human Factors 6–8 weeks
Last reviewed: March 2026

Overview

Pilot fatigue is a documented factor in numerous aviation accidents and incidents, yet it remains one of the hardest risks to quantify. Unlike mechanical failures, fatigue is invisible — a pilot may not recognize their own impairment. Regulatory approaches (maximum duty hours, minimum rest periods) are blunt instruments that don't account for individual variation, circadian disruption, or the cumulative effects of multiple early-morning report times.

The SAFTE (Sleep, Activity, Fatigue, and Task Effectiveness) model, developed by the U.S. Department of Defense, provides a bio-mathematical framework for predicting cognitive effectiveness based on sleep history and circadian phase. In this project, you'll implement a simplified SAFTE model to generate fatigue scores for realistic flight schedules, then train an ML model that predicts those scores from schedule features alone — without needing the full bio-mathematical simulation at inference time.

The ML approach has a major practical advantage: airlines could screen proposed crew schedules in real time, flagging high-fatigue pairings before they're published. You'll explore sequence models (LSTMs) that capture the temporal dynamics of fatigue accumulation — how three consecutive early-morning duties compound into a level of impairment that no single duty would produce. This is research-grade work at the intersection of human factors, operations research, and machine learning.

What You'll Learn

  • Implement a bio-mathematical fatigue model based on the SAFTE framework
  • Generate synthetic labeled data from a physics-informed model for ML training
  • Design features that capture temporal dependencies in sequential schedule data
  • Train and evaluate an LSTM for time-series fatigue prediction
  • Understand the regulatory and ethical dimensions of fatigue risk management in aviation

Step-by-Step Guide

1

Implement the SAFTE Fatigue Model

Build a simplified version of the SAFTE model in Python. The model tracks three interacting processes: the homeostatic sleep reservoir (depletes during wakefulness, replenishes during sleep), the circadian oscillator (a sinusoidal rhythm with ~24h period, nadir around 04:00 local time), and sleep inertia (grogginess immediately after waking). The output is a "cognitive effectiveness" score from 0 to 100.

Key parameters: reservoir capacity (~24 hours of wakefulness), replenishment rate during sleep (~proportional to sleep deficit), circadian amplitude (~15-20% of total effectiveness), and inertia decay constant (~30 minutes). Validate your implementation against published SAFTE output examples from the literature — the Hursh et al. (2004) paper provides reference curves.

2

Generate Synthetic Flight Schedules

Create a schedule generator that produces realistic airline pilot duty schedules. Model common patterns: 4-day trip pairings with 2-3 days off, early morning departures (05:00-07:00), late evening arrivals, red-eye flights, and multi-timezone trips. Encode each schedule as a sequence of events: (duty_start, duty_end, timezone, rest_start, rest_end, sleep_opportunity).

Generate 10,000+ unique 14-day schedule sequences spanning a range of difficulty: benign schedules (all daytime flights, same timezone) to challenging ones (mixed early/late duties, 3+ timezone crossings, minimum rest periods). For each schedule, run the SAFTE model to compute a time series of fatigue scores and extract the peak fatigue (minimum effectiveness) and cumulative fatigue-hours below threshold.

3

Engineer Schedule Features

Design features that capture the fatigue-relevant aspects of each schedule without running the full SAFTE simulation. Candidate features include: average duty start time (early starts are fatiguing), variance of start times (irregular schedules disrupt circadian rhythm), cumulative duty hours in rolling 7-day windows, number of timezone crossings and their direction (eastward is harder), minimum rest period in the schedule, and number of consecutive early-morning duties.

Also create sequence features for the LSTM: encode each day as a vector containing duty start/end times (in circadian local time), rest duration, timezone delta from home base, and cumulative duty hours. This per-day encoding lets the LSTM learn temporal patterns.

4

Train a Feedforward Baseline

Start with a gradient-boosted regression model (XGBoost or scikit-learn) trained on the aggregated schedule features (not the sequence) to predict peak fatigue score. Use 80/10/10 train/validation/test splits. This baseline tells you how much of fatigue can be predicted from summary statistics alone.

Evaluate using RMSE and mean absolute error, but also compute the sensitivity at the safety threshold: when the SAFTE model says effectiveness drops below 70 (a commonly used alerting threshold), what fraction does the ML model catch? For a safety-critical application, false negatives (missing truly fatigued pilots) are far more costly than false positives.

5

Train an LSTM Sequence Model

Build an LSTM network in PyTorch that takes the day-by-day schedule sequence (14 days x feature vector) and predicts either the full fatigue time series or the peak fatigue value. Use 2 LSTM layers with 128 hidden units, followed by a fully connected head. Train with MSE loss and the Adam optimizer (learning rate 1e-3, reduce on plateau).

The LSTM should outperform the feedforward model because fatigue is fundamentally a sequential process — the same duty on day 5 has a different impact depending on what happened on days 1–4. Compare LSTM performance to a 1D CNN over the same sequence data and to a Transformer encoder with positional embeddings. Document which architecture best captures the temporal compounding effect.

6

Sensitivity Analysis

Use the trained model to explore what-if scenarios. Take a high-fatigue schedule and systematically modify it: add 2 hours of rest before the most critical duty, swap an early-morning flight for a midday one, or insert an off-day after a timezone change. Plot how predicted fatigue changes with each intervention.

Perform a SHAP (SHapley Additive exPlanations) analysis on the gradient-boosted model to quantify feature contributions. Which feature has the largest impact on predicted fatigue? Is it cumulative duty hours, circadian disruption, or something else? Compare the SHAP results with published fatigue research — does the model's learned importance match domain knowledge?

7

Ethics, Limitations, and Reporting

Address the ethical dimensions of fatigue prediction. Discuss the tension between safety (predicting who is fatigued) and privacy (monitoring individuals). Note that your model predicts schedule-level risk, not individual pilot state — this is an important distinction for deployment. Discuss how self-reported fatigue, actual sleep data (wearables), and bio-mathematical models each contribute different information.

Document your model's limitations: the SAFTE model itself has known biases (it assumes sleep is fully restorative, doesn't model partial sleep, and uses a fixed circadian profile). Your ML model inherits these limitations and adds its own (potential overfitting to the schedule generator's patterns). Write up the project as a technical report with clear recommendations for how the approach could be validated with real airline data under appropriate research ethics approval.

Go Further

  • Incorporate wearable data — use actigraphy data from publicly available sleep studies to replace assumed sleep times with measured sleep in the SAFTE model
  • Multi-crew modeling — extend the model to predict fatigue risk for an entire crew complement (captain + first officer) and flag pairings where both crew members are simultaneously impaired
  • Reinforcement learning for scheduling — frame crew scheduling as an RL problem where the agent minimizes fatigue while satisfying operational constraints
  • Compare with FAID and CAS models — implement alternative bio-mathematical fatigue models (Fatigue Audit InterDyne, Circadian Alertness Simulator) and compare their predictions with SAFTE