Measure and Classify Sounds Around an Airport with Python

Record real-world sounds and teach a computer to tell them apart

High School Acoustics 2–3 weeks
Last reviewed: March 2026

Overview

The area around an airport is one of the richest sound environments on Earth. Jet engines roar at takeoff, propeller planes buzz during approach, ground support vehicles hum, and in between, natural sounds — wind, birds, rain — fill the gaps. Acoustic monitoring is a growing field in aviation: airports use sound classification systems to track noise complaints, verify flight path compliance, and distinguish aircraft noise from other sources.

In this project you will build your own airport sound classifier. You will collect or download short audio clips representing 4–6 different sound categories, extract numerical features from each clip using librosa (a Python audio analysis library), and train a scikit-learn classifier to identify the sound type. No prior audio or ML experience is required — the project walks you through every step.

By the end you will understand how computers "hear" — by converting sound waves into numbers that capture pitch, loudness, and timbre. These same techniques power voice assistants, music recommendation engines, and the noise monitoring systems installed at major airports worldwide.

What You'll Learn

  • Record or collect audio samples and organize them into a labeled dataset
  • Extract audio features (MFCCs, spectral centroid, zero-crossing rate) using librosa
  • Understand how digital audio is represented as waveforms and spectrograms
  • Train and evaluate a multi-class classifier with scikit-learn
  • Interpret classification results and identify which sound features distinguish each category

Step-by-Step Guide

1

Collect Your Sound Dataset

You need 20–30 audio clips per category, each 3–5 seconds long. Categories might include: jet takeoff, propeller aircraft, helicopter, ground vehicle, ambient/nature, and human speech. Record clips yourself near an airport (use a smartphone voice memo app) or download from free sound libraries like Freesound.org or the ESC-50 environmental sound dataset.

Organize your clips into folders by category. Ensure consistent format: convert everything to WAV, mono, 22050 Hz sample rate using Audacity (free) or librosa's built-in resampling. Consistency in format prevents data quality issues later.

2

Install Tools and Load Audio

Install the required libraries:

pip install librosa scikit-learn numpy matplotlib

Load an audio file with librosa and plot the waveform (amplitude vs. time) and the spectrogram (frequency vs. time, with color representing intensity). Compare the spectrogram of a jet takeoff — broad energy across all frequencies — to a propeller aircraft — distinct harmonic peaks at the blade passage frequency and its multiples. These visual differences are what your features will capture numerically.

3

Extract Audio Features

For each audio clip, compute the following features using librosa and take their mean and standard deviation across the clip: 13 MFCCs (Mel-Frequency Cepstral Coefficients — the standard audio fingerprint), spectral centroid (center of mass of the spectrum — correlates with perceived brightness), spectral bandwidth, zero-crossing rate (how often the signal crosses zero — high for noisy sounds), and RMS energy (loudness).

This gives you roughly 30–35 features per clip. Store them in a pandas DataFrame with the category label. Each row is one audio clip; each column is one feature. This is your ML-ready dataset.

4

Train the Classifier

Split your dataset into training (75%) and test (25%) with stratification. Standardize the features using StandardScaler. Train a Random Forest classifier as your first model — it handles multi-class problems well and is robust to feature scale:

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

Check accuracy on the test set. With clean data and well-separated categories, you should achieve 70–90% accuracy. If performance is low, check whether any categories have too few samples or are acoustically very similar.

5

Evaluate and Visualize Results

Generate a confusion matrix and display it as a heatmap. Which sounds does the model confuse? Jet takeoff and ground vehicle (both have strong low-frequency energy) are often mixed up. Propeller aircraft and helicopter (both have periodic blade sounds) can also be confused.

Plot the feature importances from the Random Forest. You will likely find that certain MFCCs and the spectral centroid are the most discriminative features — these capture the timbral differences between sound categories. Try an SVM (Support Vector Machine) classifier as an alternative and compare accuracy.

6

Test on New Sounds

Record 2–3 new sounds that your model hasn't seen and run them through your classification pipeline: load audio, extract features, predict. Does the model get them right? Test with a tricky case — a loud car vs. a distant jet — and discuss why the model might struggle.

Document your project with the spectrogram comparisons, the confusion matrix, feature importance plot, and a discussion of what you learned. This portfolio piece demonstrates data collection, feature engineering, and ML evaluation — exactly the workflow used in professional audio ML projects.

Go Further

  • Expand your dataset to 10+ categories and include harder distinctions like different jet engine types (turbofan vs. turboprop) or aircraft by size class.
  • Replace the hand-crafted features with a mel-spectrogram image fed into a CNN (using PyTorch), and compare accuracy against the feature-based approach.
  • Build a real-time classifier that listens through your microphone and displays the predicted sound category live — using PyAudio for streaming input.
  • Investigate sound level measurement: calibrate your microphone and compute dBA levels for each recording, adding a quantitative noise assessment dimension to the project.