Engine Sound Anomaly Detection with Autoencoders

Teach a neural network what normal sounds like — then catch everything else

Advanced Acoustics 6–8 weeks

Last reviewed: March 2026

Overview

In industrial and aerospace applications, labeled fault data is scarce — engines rarely fail, and when they do, each failure is unique. Traditional supervised classifiers require examples of every fault type, but you cannot train on failures you have never seen. Anomaly detection flips the problem: train a model only on normal operation, then flag anything that deviates from learned normalcy. This is fundamentally the right formulation for safety-critical systems where the cost of missing an anomaly far exceeds the cost of a false alarm.

An autoencoder is an ideal architecture for this task. Trained to compress and reconstruct mel-spectrograms of normal engine audio, it learns a compact representation of what "normal" sounds like. When presented with an anomalous sound — a bearing grinding, a blade rubbing, an unusual vibration — the autoencoder fails to reconstruct it accurately, producing a high reconstruction error that serves as the anomaly score.

In this project you will build a convolutional autoencoder in PyTorch, train it exclusively on spectrograms of healthy engine operation, and evaluate its ability to detect unseen anomalies. You will work with the MIMII (Malfunctioning Industrial Machine Investigation and Inspection) dataset or similar machine sound datasets, explore threshold selection strategies, and address the real-world challenges of noise contamination, varying operating conditions, and the precision-recall trade-off in safety-critical anomaly detection.

What You'll Learn

✓ Understand the autoencoder architecture and its application to unsupervised anomaly detection
✓ Convert raw audio to mel-spectrograms and prepare them as image-like inputs for convolutional networks
✓ Design and train a convolutional autoencoder with appropriate bottleneck dimensionality
✓ Select anomaly thresholds using reconstruction error distributions and evaluate with ROC and PR curves
✓ Address practical deployment challenges: varying operating conditions, noise floors, and false alarm management

Step-by-Step Guide

Acquire and Understand the Dataset

Download the MIMII dataset (Malfunctioning Industrial Machine Investigation and Inspection) from Zenodo. It contains recordings of four machine types (fans, pumps, sliders, valves) under normal and anomalous conditions at multiple SNR levels. For this project, use the fan or pump subset — these are most analogous to turbomachinery. Alternatively, use the ToyADMOS dataset or collect your own recordings from a running engine or motor.

Split the normal recordings into train (80%) and validation (20%). Set aside all anomalous recordings for testing only — the model must never see anomalous data during training. This is the core constraint of anomaly detection: you learn normalcy, not faults.

Convert Audio to Mel-Spectrograms

Segment each recording into fixed-length clips (e.g., 2 seconds). For each clip, compute a 128-band mel-spectrogram using librosa: librosa.feature.melspectrogram(y=audio, sr=16000, n_mels=128, n_fft=1024, hop_length=512). Convert to log scale (dB) and normalize to [0, 1] range.

The result is a 2D image-like representation (128 frequency bins × T time frames) for each clip. Visualize several spectrograms from normal and anomalous samples side by side. Anomalies often appear as unusual spectral peaks, broadband energy changes, or periodic patterns at unexpected frequencies. These visual differences are what the autoencoder's reconstruction error will capture automatically.

Design the Convolutional Autoencoder

Build a symmetric encoder-decoder architecture in PyTorch. The encoder: 4 convolutional layers with increasing channel counts (1→32→64→128→256), kernel size 3×3, stride 2, with ReLU and batch normalization after each. This progressively compresses the spectrogram to a small latent representation. The decoder: 4 transposed convolutional layers that mirror the encoder, reconstructing the spectrogram from the bottleneck.

The bottleneck dimensionality is a critical design choice. Too large, and the autoencoder memorizes everything (including anomalies). Too small, and it cannot reconstruct normal sounds well enough to distinguish them from anomalies. Start with a bottleneck that compresses the input by 16–32×. Use sigmoid on the output layer to match the [0, 1] normalized spectrograms.

Train on Normal Data Only

Train the autoencoder using MSE reconstruction loss on the training set (normal data only). Use Adam optimizer with learning rate 1e-3 and a cosine annealing schedule. Train for 100–200 epochs with batch size 32. Monitor the validation loss (also computed on normal data) for overfitting.

After training, compute the reconstruction error (MSE per spectrogram) for all training samples and all validation samples. Plot the distribution of reconstruction errors — it should be a tight, roughly Gaussian distribution centered at a low value. This distribution is your model of "normal" — anything with reconstruction error in the tail is potentially anomalous.

Detect Anomalies and Set Thresholds

Run the trained autoencoder on the test set containing both normal and anomalous samples. Compute reconstruction error for each sample. Plot the error distributions for normal and anomalous samples separately — the anomalous distribution should be shifted to higher errors, though overlap is common.

Select a threshold using the validation set error distribution: set the threshold at the 95th or 99th percentile of normal validation errors. Compute precision, recall, and F1 score at this threshold. Plot the full ROC curve and precision-recall curve by sweeping the threshold. Report the AUC-ROC — values above 0.85 are good; above 0.95 is excellent. The PR curve is often more informative because the class balance is skewed.

Visualize Reconstruction Failures

For several correctly detected anomalies, plot three images side by side: the input spectrogram, the reconstructed spectrogram, and the residual (absolute difference). The residual map highlights exactly which time-frequency regions the autoencoder failed to reconstruct — these are the spectral signatures of the anomaly.

This visualization is powerful for engineering interpretation: a bearing fault might appear as elevated energy in a specific frequency band, while a blade rub might show broadband impulse events at the rotation frequency. The autoencoder doesn't need to know what these faults are — it just needs to know they're not normal. Compare the residual patterns across different anomaly types to see if the autoencoder implicitly provides diagnostic information.

Robustness and Deployment Analysis

Test the model's robustness to varying signal-to-noise ratios. The MIMII dataset provides recordings at different SNR levels (6 dB, 0 dB, -6 dB). Evaluate detection performance at each SNR — performance typically degrades gracefully but may collapse below a critical SNR where background noise masks the anomaly signature.

Discuss the architecture's suitability for real-time deployment: measure inference time per spectrogram, estimate the computational requirements for continuous monitoring, and consider how you would handle concept drift (gradual changes in "normal" as the engine ages). A well-engineered anomaly detection system periodically retrains or uses adaptive thresholds to account for slow baseline shifts. Write a comprehensive report covering architecture decisions, training methodology, threshold selection rationale, and deployment considerations.

Career Connection

See how this project connects to real aerospace careers.

Go Further

Replace the convolutional autoencoder with a variational autoencoder (VAE) and use the ELBO loss — VAEs often produce smoother latent spaces and can improve anomaly detection by using the KL divergence term as an additional anomaly signal.
Implement multi-condition training: include normal data from multiple operating speeds/loads and add the operating condition as a conditioning input to the autoencoder, so a single model covers the full operating envelope.
Apply your pipeline to real aircraft engine audio — the Rolls-Royce Open Acoustic Test Data or publicly available airport recordings provide a more challenging and realistic test case.
Explore self-supervised contrastive learning (e.g., SimCLR on spectrograms) as an alternative to the autoencoder and compare anomaly detection performance.

← Back to All Projects More Advanced → Advanced Projects