Detect Surface Defects in Aerospace Parts with CNN

Train a deep learning model on real industrial defect data

Undergraduate Manufacturing 4–6 weeks
Last reviewed: March 2026

Overview

Surface defects in metal parts — cracks, inclusions, pitting, scratches — are a critical quality concern in aerospace manufacturing. A single undetected crack in a turbine blade or fuselage skin panel can lead to catastrophic failure. Traditional inspection relies on trained human inspectors, but deep learning is rapidly proving it can match or exceed human performance at detecting these defects, at much higher throughput.

In this project, you'll train a convolutional neural network (CNN) from scratch using PyTorch on the NEU Surface Defect Database — a widely used academic benchmark containing 1,800 grayscale images of hot-rolled steel surfaces across six defect classes: crazing, inclusion, patches, pitted surface, rolled-in scale, and scratches. These defect types are directly relevant to aerospace: the same surface finish requirements apply to aircraft skin panels, landing gear components, and engine parts.

Unlike the high-school project that used transfer learning as a black box, here you'll design the CNN architecture yourself, implement custom training loops, apply advanced techniques like learning rate scheduling and data augmentation, and rigorously evaluate performance. This project produces a portfolio-ready deep learning implementation and teaches the engineering skills needed for any industrial computer vision role.

What You'll Learn

  • Design and implement a CNN architecture from scratch using PyTorch for image classification
  • Work with the NEU Surface Defect Database and understand industrial defect taxonomies
  • Implement a complete training pipeline: data loading, augmentation, training loop, validation, and checkpointing
  • Apply techniques to improve small-dataset performance: data augmentation, regularization, learning rate scheduling
  • Evaluate multi-class classifier performance with per-class metrics and confusion matrix analysis

Step-by-Step Guide

1

Obtain the NEU Surface Defect Dataset

Download the NEU Surface Defect Database from the Northeastern University (China) website or Kaggle. The dataset contains 1,800 grayscale images (200×200 pixels), 300 per class, across six defect categories: crazing (fine network of cracks), inclusion (foreign material embedded in surface), patches (irregular surface regions), pitted surface (small cavities), rolled-in scale (oxide pressed into surface), and scratches (linear surface damage).

Explore the dataset visually. Some classes are easy to distinguish (scratches vs. patches), while others are subtle (crazing vs. pitted surface). This variation in difficulty is realistic — in actual manufacturing inspection, some defect types are much harder to detect than others.

2

Build the Data Pipeline

Create a PyTorch Dataset class that loads images and labels. Implement a robust data augmentation pipeline using torchvision.transforms: random horizontal/vertical flips, random rotation (±15°), random crop and resize, and brightness/contrast jitter. These augmentations are physically reasonable — a defect looks the same regardless of orientation.

Split the data: 70% training, 15% validation, 15% test. Use stratified splitting to ensure each class is proportionally represented in all splits. Create DataLoaders with batch size 32 and shuffle enabled for training.

3

Design the CNN Architecture

Build a CNN with 4–5 convolutional blocks, each containing: Conv2dBatchNorm2dReLUMaxPool2d. Start with 32 filters in the first layer and double at each block (32 → 64 → 128 → 256). Follow with a GlobalAveragePooling layer and two fully connected layers (256 → 128 → 6 classes).

Add dropout (0.3–0.5) before the fully connected layers to prevent overfitting on this small dataset. Print the model summary to verify the parameter count — aim for under 2 million parameters. A model that's too large will overfit badly on only 1,260 training images.

4

Implement the Training Loop

Write a custom training loop (don't use a high-level trainer — understanding the loop is the point). Use CrossEntropyLoss and the Adam optimizer with an initial learning rate of 1e-3. Implement a cosine annealing learning rate scheduler that decays the LR smoothly over training.

Train for 50–100 epochs. At each epoch, compute training loss, validation loss, and validation accuracy. Save the model checkpoint with the best validation accuracy. Implement early stopping: if validation loss doesn't improve for 15 epochs, stop training. Plot training and validation loss curves after training — this diagnostic plot reveals overfitting, underfitting, and learning rate issues at a glance.

5

Evaluate and Analyze Per-Class Performance

Load the best checkpoint and evaluate on the held-out test set. Generate a 6×6 confusion matrix — this is the single most informative diagnostic for a multi-class classifier. Which classes does the model confuse? Crazing and pitted surface are commonly confused because both involve small-scale surface texture changes.

Report per-class precision, recall, and F1 score. In manufacturing, recall (sensitivity) for each defect type is critical: a missed defect reaches the aircraft. Calculate the overall accuracy and macro-averaged F1. With good augmentation and architecture, 95%+ accuracy is achievable on NEU. Compare your CNN against a simpler model — train an SVM on flattened pixel features to see how much the CNN's learned features help.

6

Compare Against Transfer Learning

Fine-tune a pre-trained ResNet-18 on the same dataset for comparison. Replace the final FC layer with a 6-class output, freeze the early layers, and train only the last few blocks. ResNet-18 will likely outperform your custom CNN because it was pre-trained on millions of images — but understanding why transfer learning helps and when you might still want a custom architecture is key engineering judgment.

Compare: custom CNN vs. fine-tuned ResNet-18 on accuracy, inference speed, and model size. For edge deployment in a factory (on embedded hardware), a smaller custom model may actually be preferable despite lower accuracy.

7

Document for a Portfolio

Write up your project as a portfolio piece. Include: problem statement, dataset description, architecture diagram, training curves, confusion matrix, per-class metrics, and comparison with the transfer learning baseline. Show sample predictions — both correct and incorrect — with the images displayed alongside the model's predicted probabilities for each class.

Discuss how this approach would scale to a real aerospace factory: what changes for production (continuous data pipeline, monitoring for data drift, integration with PLCs and MES systems)? This systems-thinking perspective is what separates a class project from a production-ready solution.

Go Further

Push your manufacturing ML skills further:

  • Object detection with YOLO — instead of classifying whole images, train a detector to localize defects within larger surface images with bounding boxes
  • Semantic segmentation — use U-Net to produce pixel-level defect masks that precisely outline each defect's boundary
  • Anomaly detection approach — train only on "good" parts and detect defects as anomalies, which is more realistic when defect examples are rare
  • Explainability with Grad-CAM — generate heatmaps showing which image regions drive each classification decision, building trust for human inspectors