Predict Aircraft Noise Levels with Neural Networks

Use NASA wind tunnel data to predict how loud an airfoil is

Undergraduate Acoustics 4–6 weeks
Last reviewed: March 2026

Overview

Aerodynamic noise — the sound generated when air flows over a surface — is a major design constraint for aircraft, wind turbines, and high-speed vehicles. Trailing edge noise, the dominant broadband noise source for airfoils, depends on the airfoil's shape, the flow speed, the angle of attack, and the boundary layer characteristics. Predicting this noise accurately is essential for designing quieter aircraft and meeting increasingly strict community noise regulations.

NASA conducted extensive wind tunnel tests on NACA 0012 airfoils specifically to study self-noise mechanisms, varying chord length, free-stream velocity, angle of attack, and boundary layer displacement thickness. The resulting dataset — 1,503 observations of scaled sound pressure level (dB) as a function of five input variables — is a classic benchmark in the UCI Machine Learning Repository and a perfect testbed for regression with neural networks.

In this project you will build a feedforward neural network in PyTorch to predict sound pressure level from the five aeroacoustic parameters. You will explore network architecture, activation functions, regularization, and learning rate scheduling — core deep learning skills — while gaining physical intuition about what makes airfoils loud. The dataset is small enough to train on a laptop in minutes, but rich enough to reward careful model engineering.

What You'll Learn

  • Load and explore the NASA Airfoil Self-Noise dataset, understanding each input variable's physical meaning
  • Build a feedforward neural network from scratch in PyTorch with custom training loops
  • Apply regularization techniques (dropout, weight decay, batch normalization) to prevent overfitting
  • Tune hyperparameters systematically and track experiments with clear documentation
  • Connect model predictions to aeroacoustic theory — explaining why noise scales with velocity and angle of attack

Step-by-Step Guide

1

Download and Explore the Dataset

Download the Airfoil Self-Noise dataset from the UCI Machine Learning Repository. The five input features are: frequency (Hz), angle of attack (degrees), chord length (meters), free-stream velocity (m/s), and suction side displacement thickness (meters). The target is scaled sound pressure level (dB).

Load the data into pandas and compute summary statistics. Plot the target (SPL) against each feature individually using scatter plots. You should see clear trends: SPL generally increases with velocity (noise scales roughly as V⁵ for trailing edge noise) and shows a complex relationship with angle of attack and frequency. The displacement thickness encodes boundary layer state, which strongly influences noise generation.

2

Preprocess and Split the Data

Normalize all features and the target using StandardScaler — neural networks train much faster when inputs are centered and scaled. Split into train (70%), validation (15%), and test (15%) sets. Convert to PyTorch tensors and wrap in DataLoader objects with a batch size of 32–64.

Create a baseline: train a scikit-learn linear regression on the same splits and record its test RMSE and R². This gives you a floor to beat with the neural network. Also try a polynomial regression (degree 2) — the V⁵ power law suggests non-linear features will help even without deep learning.

3

Build the Neural Network

Define a feedforward network in PyTorch: 5 input features → 64 hidden units → 128 hidden units → 64 hidden units → 1 output. Use ReLU activation between layers. Initialize with the default PyTorch initialization (Kaiming uniform).

Write a custom training loop: for each epoch, iterate over batches from the training DataLoader, compute MSE loss, backpropagate, and update weights with Adam optimizer (lr=1e-3). After each epoch, evaluate on the validation set without gradient computation. Print train and validation loss every 10 epochs. Train for 200–500 epochs.

4

Tune and Regularize

If the validation loss is significantly higher than training loss, you are overfitting. Add dropout (0.1–0.3) after each hidden layer and weight decay (1e-4 to 1e-3) to the Adam optimizer. Try adding batch normalization before the activation functions.

Experiment systematically: vary the number of layers (2–5), hidden units (32–256), learning rate (1e-2 to 1e-4), and dropout rate. Use a simple grid or random search — document each experiment's validation RMSE in a table. A good model should achieve R² > 0.90 on the test set; with careful tuning, R² > 0.95 is achievable.

5

Analyze Predictions and Residuals

Plot predicted vs. actual SPL on the test set. Identify any systematic biases — does the model under-predict at high SPL or over-predict at low frequencies? Plot residuals (errors) against each input feature to check for patterns. If residuals correlate with a specific feature, consider adding that feature's higher-order terms or transformations.

Compute the model's test RMSE in decibels. In aeroacoustics, a prediction error below 2–3 dB is generally considered useful for design purposes. Compare your neural network's accuracy to the linear and polynomial baselines — how much did the non-linearity of the neural network buy you?

6

Physical Interpretation

Use the trained model to explore aeroacoustic trends. Fix all inputs except free-stream velocity and sweep it from 30 to 70 m/s — plot the predicted SPL. Does the model learn the expected velocity power law? Similarly, sweep angle of attack and frequency to check if the predictions match known acoustic behavior.

Try feature ablation: retrain the model with one feature removed at a time and observe which omission hurts accuracy the most. This tells you which physical parameter is most important for noise prediction — you should find that velocity and displacement thickness dominate, matching aeroacoustic theory. Write up a discussion connecting the ML results to the physics of trailing edge noise generation.

Go Further

  • Replace the feedforward network with a 1D-CNN that takes spectral data as input (group the data by frequency) and predicts the full noise spectrum in one pass.
  • Implement a physics-informed loss that penalizes predictions violating known scaling laws (e.g., SPL ∝ V⁵ for trailing edge noise) and evaluate whether it improves generalization.
  • Apply your model to wind turbine airfoil noise prediction — the same physical mechanisms apply but at different Reynolds numbers and in a rotating reference frame.
  • Compare your PyTorch model against BPM (Brooks, Pope, Marcolini) semi-empirical noise prediction method and discuss where each approach excels.