Build a Word Cloud and Classifier from Aviation Safety Reports

Mine real incident reports to discover what goes wrong in the cockpit

High School Human Factors 2–3 weeks

Last reviewed: March 2026

Overview

Every year, pilots, controllers, and mechanics voluntarily submit thousands of safety reports to NASA's Aviation Safety Reporting System (ASRS). These confidential, de-identified reports describe near-misses, mistakes, and unsafe conditions — and they're a goldmine for understanding what actually goes wrong in aviation. In this project, you'll use natural language processing (NLP) to analyze these reports and build a system that automatically categorizes them.

You'll start by downloading ASRS reports and creating word clouds — visual summaries that show the most common words in different types of incidents. Are the words in "runway incursion" reports different from "controlled flight toward terrain" reports? Absolutely — and the differences reveal what matters in each type of incident.

Then you'll build a text classifier using the bag-of-words approach and a Naive Bayes algorithm. This classic NLP pipeline is the foundation of spam filters, sentiment analysis, and medical record classification. By the end, you'll have a working model that can read a new safety narrative and predict what type of incident it describes.

What You'll Learn

✓ Download and process real aviation safety data from NASA ASRS
✓ Create word clouds and frequency analyses to explore text data
✓ Understand the bag-of-words representation and TF-IDF weighting
✓ Train and evaluate a Naive Bayes text classifier
✓ Interpret classification results in an aviation safety context

Step-by-Step Guide

Download ASRS Reports

Go to the NASA ASRS database (asrs.arc.nasa.gov) and use the search interface to download incident reports. Select several categories — such as Airspace Violation, Controlled Flight Toward Terrain, Runway Incursion, and Equipment/Component Malfunction. Download 200–500 reports per category in CSV format.

Each report contains structured fields (date, aircraft type, phase of flight) and — most importantly — a free-text narrative where the reporter describes what happened in their own words. The narrative is what you'll analyze.

Clean the Text Data

Load the CSVs into pandas. Extract the narrative text column and clean it: lowercase everything, remove punctuation and numbers, and remove common English "stop words" (the, a, is, was) using scikit-learn's built-in stop word list. These words appear in every report and carry no category-specific meaning.

Also remove aviation-specific stop words you identify: terms like "aircraft," "pilot," "flight," and "ATC" appear in nearly every report regardless of category. Removing them makes your word clouds and classifier focus on the distinguishing vocabulary.

Create Word Clouds

Install the wordcloud Python package (pip install wordcloud). For each incident category, generate a word cloud from the cleaned narratives. Display all four word clouds in a 2x2 matplotlib grid for easy comparison.

Study the differences. In runway incursion reports, you'll see words like "taxi," "hold," "clearance," and "runway." In equipment malfunction reports, you'll see "engine," "hydraulic," "warning," and "indication." These visual patterns are your first insight into what NLP can extract from text.

Build the Bag-of-Words Representation

Use scikit-learn's CountVectorizer to convert each narrative into a bag-of-words vector — a list of word counts for every word that appears in the dataset. Then upgrade to TfidfVectorizer, which weights words by term frequency-inverse document frequency: words that appear in one category but not others get higher weights.

Set max_features=2000 to keep only the 2,000 most common words — this prevents the model from memorizing rare words that appear in only one report.

Train the Naive Bayes Classifier

Split your data into training (80%) and test (20%) sets. Train a Multinomial Naive Bayes classifier (from sklearn.naive_bayes import MultinomialNB) on the TF-IDF features. This algorithm is fast, works well with text data, and has a clear probabilistic interpretation.

Run model.predict(X_test) and compare predictions to actual categories. Print the classification report showing precision, recall, and F1-score for each category. Expect accuracy in the 70–85% range with this simple approach.

Analyze Errors and Improve

Look at the reports your model misclassified. What went wrong? Often the narrative describes elements of multiple categories (an equipment failure that leads to an airspace violation). These ambiguous reports are genuinely hard to classify — even for human experts.

Try improving performance by adjusting the TfidfVectorizer parameters: add bigrams (ngram_range=(1,2)) so the model can learn phrases like "runway incursion" as single features, not separate words. Also try LinearSVC as an alternative classifier and compare performance.

Career Connection

See how this project connects to real aerospace careers.

Go Further

Build a report recommender — given a new incident, find the 5 most similar past reports using cosine similarity on TF-IDF vectors
Analyze trends over time — plot incident category frequencies by year to see if certain types of events are increasing or decreasing
Try sentiment analysis — apply a pre-trained sentiment model to ASRS narratives; do more "negative" narratives correlate with more severe incidents?
Expand to NTSB reports — apply the same pipeline to formal NTSB accident investigation narratives, which are longer and more detailed

Related Projects

High School Satellite Image Classification Teach a computer to read the Earth from space View Project →

← Back to All Projects More High School → High School Projects