Graph Neural Network for Microstructure-Property Prediction

Turn microscope images into graphs and predict how strong the metal is

Advanced Materials 6–8 weeks
PyTorchPyTorch Geometric
Last reviewed: March 2026

Overview

The mechanical properties of a metal or composite — yield strength, ultimate tensile strength, fatigue life, fracture toughness — are determined by its microstructure: the size, shape, orientation, and connectivity of grains, phases, and defects visible under a microscope. Traditional materials characterization extracts hand-crafted features from micrographs (average grain size, aspect ratio, phase fraction), but these summaries discard the rich spatial and topological information encoded in the microstructure.

Graph neural networks (GNNs) offer a fundamentally better representation. By segmenting a micrograph into individual grains and encoding each grain as a node (with features like area, orientation, phase) and each grain boundary as an edge (with features like boundary length, misorientation angle), you create a graph that preserves the full topology of the microstructure. A GNN can then learn to predict bulk mechanical properties from this graph — capturing effects like grain boundary networks and clustering that traditional features miss.

This project is at the frontier of materials informatics research. You will work with open metallography datasets, implement graph construction from segmented micrographs, build a GNN in PyTorch Geometric, and evaluate whether the graph representation outperforms traditional feature-based models. The workflow generalizes to any material system where microstructure governs performance.

What You'll Learn

  • Segment microstructure images into individual grains using watershed or Voronoi-based methods
  • Construct graph representations of microstructures with physically meaningful node and edge features
  • Implement graph convolutional layers (GCN, GraphSAGE, or GAT) using PyTorch Geometric
  • Train a graph-level regression model to predict mechanical properties from microstructure graphs
  • Compare GNN performance against baseline models using traditional hand-crafted microstructure descriptors

Step-by-Step Guide

1

Acquire and Explore Microstructure Data

Obtain micrograph images and associated mechanical property measurements from an open dataset. Good sources include the UHCSDB (Ultra-High Carbon Steel Database from Carnegie Mellon), the NIST microstructure datasets, or the MatBench benchmark suite. Each sample should have a micrograph image and at least one target property (yield strength, hardness, or fatigue life).

Visualize several micrographs and their corresponding property values. Look for visible differences between high-strength and low-strength samples — grain size is usually the most obvious, but boundary character and phase distribution also matter. If you cannot find a dataset with paired images and properties, you can generate synthetic microstructures using Dream3D or a Voronoi tessellation generator and assign properties using Hall-Petch or mixture-rule relationships.

2

Segment Micrographs into Grain Maps

Apply image segmentation to identify individual grains. For well-etched optical micrographs, a combination of Otsu thresholding, morphological operations, and watershed transform in scikit-image works well. For EBSD (electron backscatter diffraction) data, grains are already segmented by orientation — you can directly use the grain map.

The output is a labeled image where each pixel is assigned a grain ID. Compute per-grain properties: centroid position, area, perimeter, aspect ratio, and (if available from EBSD) crystallographic orientation. Quality-check the segmentation by overlaying grain boundaries on the original image — over-segmentation is preferable to under-segmentation for GNN purposes.

3

Build Graph Representations

Convert each segmented micrograph into a PyTorch Geometric Data object. Each grain becomes a node with a feature vector: [area, perimeter, aspect_ratio, centroid_x, centroid_y, orientation]. Each shared grain boundary becomes an undirected edge with features: [boundary_length, misorientation_angle, distance_between_centroids].

Use the Region Adjacency Graph (RAG) from scikit-image to determine which grains share boundaries. Normalize all features to zero mean and unit variance across the dataset. A typical micrograph might yield 50–500 nodes — small enough for efficient training, large enough for the GNN to learn meaningful patterns. Assign the target mechanical property as the graph-level label (data.y).

4

Implement the GNN Architecture

Build a graph-level regression model in PyTorch Geometric. A proven architecture: 3–4 GraphSAGE or GATConv layers with 64–128 hidden channels, followed by a global mean/max pooling layer to produce a fixed-size graph embedding, then 2 fully connected layers to predict the target property.

Use ReLU activations and batch normalization between graph convolution layers. For the pooling step, try both global_mean_pool and global_add_pool — addition preserves information about graph size (number of grains), which is physically relevant since more grains generally means smaller grain size and higher strength.

5

Train and Tune the Model

Split your dataset into train/validation/test sets (70/15/15) at the sample level — never split within a single micrograph. Use MSE loss for regression. Train with Adam optimizer, initial learning rate 1e-3, and a ReduceLROnPlateau scheduler watching the validation loss.

Monitor training and validation loss curves for overfitting. With small datasets (common in materials science, where samples are expensive), regularization is critical: use dropout (0.2–0.4) after each graph conv layer and consider early stopping on validation loss. If you have fewer than 200 samples, k-fold cross-validation (k=5) gives more reliable performance estimates than a single train/test split.

6

Compare Against Baselines

Train two baseline models on the same data: (1) a Random Forest using hand-crafted features (mean grain size, grain size standard deviation, phase fraction, mean aspect ratio — computed from the same segmentations), and (2) a CNN that takes the raw micrograph image as input. Compare all three approaches (GNN, Random Forest, CNN) on test-set MAE and R².

The GNN should outperform the hand-crafted features by capturing topological information. Whether it beats the CNN depends on dataset size and image quality — for small datasets, the GNN's inductive bias (graph structure) often provides a useful advantage over the CNN's brute-force pixel approach.

7

Interpret and Document

Use GNN explainability techniques to understand what the model learned. PyTorch Geometric's GNNExplainer highlights which nodes and edges most influenced the prediction. Do the important nodes correspond to unusually large grains or specific phases? Do the important edges correspond to high-angle grain boundaries known to control strength?

Write up your methodology, results, and comparisons as a research-style report. Include a clear description of the graph construction pipeline (reproducibility is critical in materials informatics), the architecture diagram, training curves, and the baseline comparison table. This is a publishable-quality workflow — consider submitting to the Integrating Materials and Manufacturing Innovation (IMMI) journal or presenting at a TMS conference.

Go Further

  • Extend to 3D microstructures using serial-sectioning or X-ray tomography data — 3D graphs capture through-thickness grain connectivity that 2D sections miss.
  • Implement a generative GNN (e.g., a graph variational autoencoder) that can propose novel microstructure graphs with desired mechanical properties — inverse design.
  • Apply transfer learning: pretrain on a large synthetic microstructure dataset and fine-tune on a small experimental dataset to overcome limited real data.
  • Incorporate physics-informed loss terms (e.g., Hall-Petch scaling) to constrain the GNN predictions to be physically consistent.