Reinforcement Learning for Spacecraft Docking

Train an AI agent to autonomously dock with a space station

Advanced Space Operations 6–10 weeks

Last reviewed: March 2026

Overview

Autonomous spacecraft docking is one of the grand challenges in space engineering. SpaceX's Dragon capsule already does it, and future cislunar and Mars missions will require even more capable autonomous systems operating with significant communication delays.

In this project, you'll build a custom Gymnasium environment that simulates spacecraft proximity operations in 2D (planar orbital mechanics). Then you'll train a reinforcement learning agent using Stable-Baselines3 to learn a docking policy from scratch — controlling thrusters to approach a target, match velocities, and dock within tolerance.

This combines orbital mechanics, control theory, and modern RL — exactly the intersection that NASA and SpaceX are investing in for next-generation autonomous missions.

What You'll Learn

✓ Build a custom OpenAI Gymnasium environment with continuous state and action spaces
✓ Implement simplified Clohessy-Wiltshire relative motion dynamics for proximity operations
✓ Design reward functions that shape safe, fuel-efficient docking behavior
✓ Train RL agents (PPO, SAC) using Stable-Baselines3 and tune hyperparameters
✓ Evaluate learned policies for safety, fuel efficiency, and robustness
✓ Compare RL-based control against classical linear feedback controllers

Step-by-Step Guide

Study the Dynamics

Review the Clohessy-Wiltshire (CW) equations — the linearized relative motion equations for two spacecraft in nearby circular orbits. These give you the state dynamics: relative position (x, y) and velocity (vx, vy) in the LVLH (Local Vertical Local Horizontal) frame.

Implement the CW equations in Python and simulate free-drift trajectories to build intuition. Orbital mechanics creates non-intuitive behavior: thrust "forward" in orbit actually raises altitude.

Build the Gymnasium Environment

Create a custom gymnasium.Env class with:

State: [x, y, vx, vy, fuel_remaining] — 5D continuous
Action: [thrust_x, thrust_y] — 2D continuous, bounded by max thrust
Dynamics: CW equations with discrete time step (1–10 seconds)
Initial conditions: randomized start position 100–500m from target
Success: position within 0.5m and velocity within 0.05 m/s of target

Design the Reward Function

The reward function is the most critical design decision. Start with:

Distance penalty: -k1 × distance_to_target (encourages approach)
Velocity penalty: -k2 × relative_velocity when close (encourages gentle approach)
Fuel penalty: -k3 × thrust_magnitude (encourages efficiency)
Success bonus: +1000 for achieving docking conditions
Crash penalty: -500 for excessive approach velocity

Tuning these weights is where the real engineering happens. Too much fuel penalty and the agent won't dock. Too little and it wastes propellant.

Train with PPO

Start with PPO (Proximal Policy Optimization) from Stable-Baselines3 — it's robust and works well for continuous control:

from stable_baselines3 import PPO

Train for 1–5 million timesteps. Monitor episode reward, success rate, and fuel usage. Use TensorBoard for visualization. Training should take 30–60 minutes on a modern CPU.

Evaluate and Visualize

Run 1,000 evaluation episodes with the trained agent. Calculate: success rate, mean fuel consumption, mean docking velocity, and mean time to dock. Plot example trajectories showing the approach path.

Create an animation of the docking maneuver — seeing the agent learn to perform a V-bar approach (approach from the velocity direction) is very satisfying.

Compare Against Classical Control

Implement a linear quadratic regulator (LQR) for the same docking problem. The CW equations are linear, so LQR gives the optimal linear feedback solution. Compare the RL agent against LQR in terms of success rate, fuel usage, and robustness to initial conditions.

Where does RL shine? Typically in handling constraints (fuel limits), non-linear scenarios (large relative distances where CW breaks down), and uncertainty (sensor noise, thruster failures).

Add Complexity

Extend the environment with realistic challenges: sensor noise on position/velocity measurements, thruster dead-bands (minimum impulse), communication delays, or keep-out zones (regions the agent must avoid). Retrain and evaluate how the RL agent adapts.

Career Connection

See how this project connects to real aerospace careers.

Go Further

Extend your RL + space research:

3D dynamics — extend to full 3D CW equations with 6-DOF (position + attitude) control
Multi-agent — train two agents to dock with each other, or one agent servicing multiple targets
Sim-to-real transfer — add domain randomization to make the policy robust enough for real hardware
Publish — RL for spacecraft autonomy is actively published at AIAA SciTech, AAS GN&C, and RL conferences

Related Projects

Undergraduate Satellite Orbit Propagator Predict where any satellite will be, minute by minute View Project → Advanced Physics-Informed Neural Net for Aeroelasticity Train a neural network that respects the laws of physics View Project →

← Back to All Projects More Advanced → Advanced Projects