Track Objects in Drone Video
Write a Python script that follows moving objects through aerial footage automatically.
Last reviewed: March 2026Overview
Drone cameras generate enormous amounts of video every day—for infrastructure inspection, wildlife surveys, precision agriculture, and border surveillance. Manually reviewing hours of footage to find moving objects is slow and error-prone. Computer vision algorithms can process video automatically, flagging and tracking objects of interest in real time. In this project you will build exactly that kind of system using OpenCV, the world's most widely used computer vision library.
You will start with publicly available aerial drone footage (several research datasets are freely downloadable) and work through progressively more sophisticated techniques. First you will use background subtraction to isolate moving objects from a static background. Then you will apply contour detection to draw bounding boxes around detected objects. Finally you will use OpenCV's CSRT tracker—a robust multi-object tracker—to follow selected objects across frames even when they temporarily overlap or leave the field of view.
The techniques in this project underpin real aerospace applications: the U.S. Air Force uses similar algorithms in wide-area surveillance systems, wildlife biologists use them to count animal populations from drone surveys, and search-and-rescue teams use them to spot survivors in disaster footage. You do not need any prior programming experience—the guide walks you through every line of code.
What You'll Learn
- ✓ Load, display, and write video files using OpenCV's VideoCapture and VideoWriter.
- ✓ Apply background subtraction (MOG2) to isolate moving objects in aerial video.
- ✓ Use contour detection and bounding-box filtering to identify candidate objects.
- ✓ Initialize and update an OpenCV object tracker (CSRT) across video frames.
- ✓ Evaluate tracker performance by computing Intersection over Union (IoU) on annotated frames.
Step-by-Step Guide
Set up OpenCV and acquire test footage
Install OpenCV with pip install opencv-python numpy matplotlib. Download a sample aerial drone video from the VisDrone dataset (github.com/VisDrone) or film your own short clip from a park. Load the video with cap = cv2.VideoCapture("drone_video.mp4") and display the first frame with cv2.imshow. Print the frame dimensions, frame rate, and total frame count using cap.get() to understand what you are working with.
Apply background subtraction to find motion
Create a background subtractor: fgbg = cv2.createBackgroundSubtractorMOG2(history=100, varThreshold=50). In a loop, read each frame and apply the subtractor: fgmask = fgbg.apply(frame). Display the foreground mask alongside the original frame. Moving objects appear white against a black background. Apply morphological operations to clean up noise: kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5)) followed by cv2.morphologyEx(fgmask, cv2.MORPH_OPEN, kernel).
Detect objects with contour analysis
Find contours in the cleaned mask with contours, _ = cv2.findContours(fgmask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE). Filter contours by area to remove tiny noise blobs—only keep those with area > 500 pixels. For each surviving contour, compute the bounding rectangle with cv2.boundingRect(contour) and draw it on the original frame in green. Count detections per frame and plot this count over time to see when activity peaks in your footage.
Initialize a tracker on a selected object
Pause video on frame 1 and let the user draw a bounding box around one object using roi = cv2.selectROI("Select Object", frame). Create a CSRT tracker: tracker = cv2.TrackerCSRT_create() and initialize it: tracker.init(frame, roi). In subsequent frames, call success, box = tracker.update(frame). If success is True, draw the bounding box at the new position. Print a message if tracking fails (object left frame or became occluded).
Track multiple objects simultaneously
Use OpenCV's MultiTracker (or maintain a Python list of CSRT trackers manually) to track 3–5 objects at once. Let the user click to select each object in turn on the first frame. Update all trackers each frame and draw each bounding box in a different color. Save the output as a new video file using VideoWriter. This multi-object tracking output is what a real surveillance system would hand off to a human analyst.
Evaluate tracker accuracy with IoU
Manually annotate the ground-truth bounding box for your tracked object in 20 evenly spaced frames by pausing the video and recording coordinates. Compute Intersection over Union (IoU) for each frame: IoU = area of overlap / area of union. A value above 0.5 is typically considered a successful track. Plot IoU over time and note where it drops—these are usually moments of occlusion, fast motion, or lighting change. Discuss what improvements (e.g., deep learning trackers) could address these weaknesses.
Career Connection
See how this project connects to real aerospace careers.
Drone & UAV Ops →
UAV operators running surveillance, inspection, or wildlife surveys use object tracking software extensively; building one from scratch makes you a far more effective operator.
Aerospace Engineer →
Computer vision is increasingly embedded in aerospace sensor systems, from runway foreign object detection to in-space proximity operations cameras on the ISS.
Air Traffic Control →
Surface movement radar at major airports increasingly incorporates computer vision to track vehicles on taxiways—the same background subtraction and tracking methods used here.
Space Operations →
Object detection and tracking algorithms are used to identify space debris in telescope images and track spacecraft during rendezvous operations.
Go Further
- Replace CSRT with a deep learning tracker like SiamRPN+ (available through the MMTracking library) and compare accuracy on fast-moving objects.
- Add a speed estimator: using known altitude, camera focal length, and pixel displacement per frame, estimate vehicle speed in km/h.
- Implement automatic tracker re-initialization when IoU drops below 0.3 by using the background subtraction detections as a re-detection mechanism.
- Run your tracking pipeline on a live webcam feed to see performance in real time—hold up objects and move them around to test edge cases.