A laptop-camera based object tracking prototype that detects a red target using OpenCV, sends pan and tilt angles to an ESP32 over serial, moves two 9g servos, and displays live tracking values on a 128×64 OLED screen.
This project is a pan-tilt tracking prototype built using a laptop webcam, Python/OpenCV, an ESP32, two micro servos, and a small OLED display. The laptop webcam detects a red dot in the video frame and calculates how far it is from the center of the image. Python converts that target position into pan and tilt servo angles, then sends the data to the ESP32 over USB serial.
The ESP32 receives the latest target values, displays them on the OLED screen, and moves the servos smoothly toward the desired pan and tilt angles. The current setup uses a fixed laptop webcam, meaning the camera itself does not move. The servos act as a separate pointing mechanism that follows the red target based on where it appears in the webcam frame.
The goal was to build a simple but visually impressive tracking demo: when a red dot moves in front of a laptop webcam, a pan-tilt servo mechanism should follow it in real time. The project started with basic red-dot detection, then evolved into serial communication, OLED debugging, servo control, two-axis movement, calibration offsets, and jitter reduction.
The main challenge was not just detecting the red dot. The harder part was making the servo movement look fast, smooth, and stable. Too many serial updates caused jitter, while too few updates made the movement look steppy. The final approach uses slower serial updates from Python and smooth local interpolation on the ESP32.
The project intentionally uses common, low-cost components. The laptop handles the heavy computer vision work, while the ESP32 handles servo control and OLED display updates.
Receives serial data from the laptop, parses target coordinates and angles, controls the servos, and updates the OLED screen.
Provides the camera feed for red-dot detection. In this prototype, the webcam is fixed and does not physically move with the servos.
One servo controls pan movement left and right, while the second servo controls tilt movement up and down.
Displays target X/Y position, error from center, distance from center, and current pan/tilt values. Most modules use SSD1306 over I2C.
Powers the servos separately from the ESP32. A stable 5V supply is important because fast servo movement can draw high current and cause resets or jitter.
A red dot, red marker cap, sticker, or small red object is used as the visual target for OpenCV tracking.
The system is split into two parts. The laptop performs image processing and target-angle calculation. The ESP32 performs real-time servo output and OLED display updates. This division keeps the ESP32 workload simple and makes the project easier to debug.
The red dot is detected using HSV color segmentation. Red is slightly tricky because hue wraps around in HSV, so the code uses two red ranges: one near 0 degrees and one near 180 degrees. The two masks are combined, cleaned using erosion and dilation, then the largest contour is selected as the target.
Because the laptop webcam is fixed, the servo movement does not change what the webcam sees. This means the system should not repeatedly add or subtract from the servo angle based on error. Instead, the camera position is mapped directly to servo angles.
Python sends target angles at a controlled rate. The ESP32 then moves the physical servos smoothly toward those target angles. This reduces jitter while still making the system feel responsive.
The servos should be powered from an external 5V supply, not from the ESP32 3.3V pin. Two 9g servos can draw enough current during fast movement to cause voltage dips, jitter, or ESP32 resets.
A servo angle of 90 degrees does not always mean the pan-tilt mechanism physically points straight ahead. This depends on how the servo horn and bracket are mounted. The code includes pan and tilt offsets so the center position can be corrected in software.
These are the latest working versions of the Python and ESP32 programs. The Python code handles camera tracking and serial output. The ESP32 code handles OLED display and pan-tilt servo movement.
import cv2
import numpy as np
import serial
import time
CAMERA_INDEX = 0
MIN_AREA = 300
# Change this to your ESP32 serial port
# Windows example: "COM3"
# macOS example: "/dev/tty.SLAB_USBtoUART"
# Linux example: "/dev/ttyUSB0"
SERIAL_PORT = "COM3"
BAUD_RATE = 115200
# Serial send tuning
SEND_INTERVAL = 0.05 # 50ms, avoids sending too many commands
ANGLE_CHANGE_THRESHOLD = 4 # ignore tiny angle changes
# Position smoothing
# 1.0 = no smoothing
# 0.65 = fast but reduces jitter
# 0.25 = smoother but slower
POSITION_SMOOTHING = 0.65
# Small dead zone around center
X_DEAD_ZONE_PX = 8
Y_DEAD_ZONE_PX = 8
# Limits sudden noisy angle jumps
MAX_ANGLE_JUMP = 12
# Distance tuning
# This is manual tuning. It does not measure real distance.
# Smaller value = smaller servo movement
# Larger value = bigger servo movement
TARGET_DISTANCE_CM = 100
REFERENCE_DISTANCE_CM = 100
# Servo angle limits
PAN_MIN_ANGLE = 10
PAN_MAX_ANGLE = 170
PAN_CENTER_ANGLE = 90
TILT_MIN_ANGLE = 30
TILT_MAX_ANGLE = 150
TILT_CENTER_ANGLE = 90
# Physical mounting offsets
# Use these if the servos do not point to your desired center at 90 degrees
PAN_OFFSET = 0
TILT_OFFSET = 0
# Sensitivity tuning
# Larger value = more dramatic servo movement
BASE_PAN_RANGE = 100
BASE_TILT_RANGE = 80
# Direction tuning
# Change signs if the servo moves the wrong way
PAN_DIRECTION = -1
TILT_DIRECTION = 1
last_send_time = 0
last_sent_pan_angle = None
last_sent_tilt_angle = None
smoothed_target_x = None
smoothed_target_y = None
last_output_pan_angle = None
last_output_tilt_angle = None
def clamp(value, minimum, maximum):
return max(minimum, min(maximum, value))
def limit_angle_jump(new_angle, last_angle, max_jump):
if last_angle is None:
return new_angle
if new_angle > last_angle + max_jump:
return last_angle + max_jump
if new_angle < last_angle - max_jump:
return last_angle - max_jump
return new_angle
def smooth_target_position(raw_x, raw_y):
global smoothed_target_x, smoothed_target_y
if smoothed_target_x is None:
smoothed_target_x = raw_x
smoothed_target_y = raw_y
else:
smoothed_target_x = (
smoothed_target_x * (1 - POSITION_SMOOTHING)
+ raw_x * POSITION_SMOOTHING
)
smoothed_target_y = (
smoothed_target_y * (1 - POSITION_SMOOTHING)
+ raw_y * POSITION_SMOOTHING
)
return int(smoothed_target_x), int(smoothed_target_y)
def calculate_pan_tilt_angles(
target_center_x,
target_center_y,
frame_w,
frame_h,
):
global last_output_pan_angle, last_output_tilt_angle
frame_center_x = frame_w // 2
frame_center_y = frame_h // 2
error_x = target_center_x - frame_center_x
error_y = target_center_y - frame_center_y
# Tiny dead zone to reduce micro jitter
if abs(error_x) < X_DEAD_ZONE_PX:
error_x = 0
if abs(error_y) < Y_DEAD_ZONE_PX:
error_y = 0
distance_from_center = int((error_x ** 2 + error_y ** 2) ** 0.5)
# Convert pixel error to -1.0 to +1.0
normalized_x = error_x / (frame_w / 2)
normalized_y = error_y / (frame_h / 2)
# Manual distance compensation
distance_scale = TARGET_DISTANCE_CM / REFERENCE_DISTANCE_CM
pan_range = BASE_PAN_RANGE * distance_scale
tilt_range = BASE_TILT_RANGE * distance_scale
pan_angle = (
PAN_CENTER_ANGLE
+ PAN_OFFSET
+ int(PAN_DIRECTION * normalized_x * pan_range)
)
tilt_angle = (
TILT_CENTER_ANGLE
+ TILT_OFFSET
+ int(TILT_DIRECTION * normalized_y * tilt_range)
)
pan_angle = clamp(pan_angle, PAN_MIN_ANGLE, PAN_MAX_ANGLE)
tilt_angle = clamp(tilt_angle, TILT_MIN_ANGLE, TILT_MAX_ANGLE)
# Limit sudden noisy jumps while keeping movement snappy
pan_angle = limit_angle_jump(
pan_angle,
last_output_pan_angle,
MAX_ANGLE_JUMP,
)
tilt_angle = limit_angle_jump(
tilt_angle,
last_output_tilt_angle,
MAX_ANGLE_JUMP,
)
last_output_pan_angle = pan_angle
last_output_tilt_angle = tilt_angle
return error_x, error_y, distance_from_center, pan_angle, tilt_angle
def should_send_angles(pan_angle, tilt_angle):
global last_send_time, last_sent_pan_angle, last_sent_tilt_angle
current_time = time.time()
if current_time - last_send_time < SEND_INTERVAL:
return False
pan_changed = (
last_sent_pan_angle is None
or abs(pan_angle - last_sent_pan_angle) >= ANGLE_CHANGE_THRESHOLD
)
tilt_changed = (
last_sent_tilt_angle is None
or abs(tilt_angle - last_sent_tilt_angle) >= ANGLE_CHANGE_THRESHOLD
)
return pan_changed or tilt_changed
def mark_angles_sent(pan_angle, tilt_angle):
global last_send_time, last_sent_pan_angle, last_sent_tilt_angle
last_send_time = time.time()
last_sent_pan_angle = pan_angle
last_sent_tilt_angle = tilt_angle
def main():
global smoothed_target_x, smoothed_target_y
global last_output_pan_angle, last_output_tilt_angle
ser = serial.Serial(SERIAL_PORT, BAUD_RATE, timeout=1)
time.sleep(2)
cap = cv2.VideoCapture(CAMERA_INDEX)
if not cap.isOpened():
print("Could not open webcam")
ser.close()
return
try:
while True:
ret, frame = cap.read()
if not ret:
print("Could not read from webcam")
break
frame_h, frame_w = frame.shape[:2]
frame_center_x = frame_w // 2
frame_center_y = frame_h // 2
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Red needs two HSV ranges because hue wraps around
lower_red_1 = np.array([0, 100, 100])
upper_red_1 = np.array([10, 255, 255])
lower_red_2 = np.array([170, 100, 100])
upper_red_2 = np.array([179, 255, 255])
mask1 = cv2.inRange(hsv, lower_red_1, upper_red_1)
mask2 = cv2.inRange(hsv, lower_red_2, upper_red_2)
mask = mask1 + mask2
# Reduce small noisy dots
kernel = np.ones((5, 5), np.uint8)
mask = cv2.erode(mask, kernel, iterations=1)
mask = cv2.dilate(mask, kernel, iterations=2)
contours, _ = cv2.findContours(
mask,
cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE,
)
status_text = "No red target"
if contours:
largest_contour = max(contours, key=cv2.contourArea)
area = cv2.contourArea(largest_contour)
if area > MIN_AREA:
x, y, w, h = cv2.boundingRect(largest_contour)
raw_target_center_x = x + w // 2
raw_target_center_y = y + h // 2
target_center_x, target_center_y = smooth_target_position(
raw_target_center_x,
raw_target_center_y,
)
(
error_x,
error_y,
distance_from_center,
pan_angle,
tilt_angle,
) = calculate_pan_tilt_angles(
target_center_x,
target_center_y,
frame_w,
frame_h,
)
if should_send_angles(pan_angle, tilt_angle):
data_to_send = (
f"{target_center_x},"
f"{target_center_y},"
f"{error_x},"
f"{error_y},"
f"{distance_from_center},"
f"{pan_angle},"
f"{tilt_angle}\n"
)
ser.write(data_to_send.encode())
mark_angles_sent(pan_angle, tilt_angle)
print("Sent:", data_to_send.strip())
status_text = f"Red found | Pan:{pan_angle} Tilt:{tilt_angle}"
# Draw bounding box
cv2.rectangle(
frame,
(x, y),
(x + w, y + h),
(0, 255, 0),
2,
)
# Draw raw detected center
cv2.circle(
frame,
(raw_target_center_x, raw_target_center_y),
4,
(0, 0, 255),
-1,
)
# Draw smoothed target center
cv2.circle(
frame,
(target_center_x, target_center_y),
7,
(255, 0, 0),
-1,
)
# Draw camera center
cv2.circle(
frame,
(frame_center_x, frame_center_y),
6,
(0, 255, 255),
-1,
)
# Draw line from camera center to smoothed target
cv2.line(
frame,
(frame_center_x, frame_center_y),
(target_center_x, target_center_y),
(255, 0, 255),
2,
)
cv2.putText(
frame,
f"X error: {error_x}px",
(20, 60),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.putText(
frame,
f"Y error: {error_y}px",
(20, 90),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.putText(
frame,
f"Distance from center: {distance_from_center}px",
(20, 120),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.putText(
frame,
f"Pan: {pan_angle} Tilt: {tilt_angle}",
(20, 150),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.putText(
frame,
f"Target distance: {TARGET_DISTANCE_CM} cm",
(20, 180),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0),
2,
)
cv2.putText(
frame,
"Raw center: red | Smoothed center: blue",
(20, 210),
cv2.FONT_HERSHEY_SIMPLEX,
0.6,
(255, 255, 255),
2,
)
else:
status_text = "Red detected but too small"
else:
# Reset smoothing when target disappears
smoothed_target_x = None
smoothed_target_y = None
last_output_pan_angle = None
last_output_tilt_angle = None
cv2.putText(
frame,
status_text,
(20, 30),
cv2.FONT_HERSHEY_SIMPLEX,
0.7,
(0, 255, 0) if "found" in status_text else (0, 0, 255),
2,
)
# Draw camera center lines
cv2.line(
frame,
(frame_center_x, 0),
(frame_center_x, frame_h),
(255, 255, 0),
2,
)
cv2.line(
frame,
(0, frame_center_y),
(frame_w, frame_center_y),
(255, 255, 0),
2,
)
cv2.imshow("Red Dot Tracking", frame)
cv2.imshow("Red Mask", mask)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
finally:
cap.release()
ser.close()
cv2.destroyAllWindows()
if __name__ == "__main__":
main()
#include <Wire.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <ESP32Servo.h>
#define SCREEN_WIDTH 128
#define SCREEN_HEIGHT 64
#define OLED_RESET -1
#define OLED_ADDRESS 0x3C
#define SDA_PIN 21
#define SCL_PIN 22
#define PAN_SERVO_PIN 18
#define TILT_SERVO_PIN 19
Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);
Servo panServo;
Servo tiltServo;
// Data received from Python
int targetX = 0;
int targetY = 0;
int errorX = 0;
int errorY = 0;
int distanceFromCenter = 0;
// Target angles received from Python
int panAngle = 90;
int tiltAngle = 90;
// Current smoothed servo angles
float currentPanAngle = 90;
float currentTiltAngle = 90;
// Servo limits
const int PAN_MIN_ANGLE = 10;
const int PAN_MAX_ANGLE = 170;
const int TILT_MIN_ANGLE = 30;
const int TILT_MAX_ANGLE = 150;
// Smoothing settings
// Higher = faster/snappier, lower = smoother/slower
const float PAN_SMOOTHING = 0.45;
const float TILT_SMOOTHING = 0.45;
// Servo updates happen locally on ESP32
// even when Python sends less frequently
const int SERVO_UPDATE_INTERVAL_MS = 10;
// OLED updates slower so it does not slow tracking
const int DISPLAY_UPDATE_INTERVAL_MS = 100;
unsigned long lastServoUpdate = 0;
unsigned long lastDisplayUpdate = 0;
bool hasReceivedData = false;
void showWaitingScreen() {
display.clearDisplay();
display.setTextSize(1);
display.setTextColor(SSD1306_WHITE);
display.setCursor(0, 0);
display.println("Red Dot Tracker");
display.setCursor(0, 16);
display.println("Waiting data...");
display.setCursor(0, 32);
display.println("Pan+Tilt smooth");
display.display();
}
void showTrackingData() {
display.clearDisplay();
display.setTextSize(1);
display.setTextColor(SSD1306_WHITE);
display.setCursor(0, 0);
display.println("SMOOTH TRACKING");
display.setCursor(0, 10);
display.print("X:");
display.print(targetX);
display.print(" Y:");
display.println(targetY);
display.setCursor(0, 22);
display.print("ErrX:");
display.print(errorX);
display.setCursor(68, 22);
display.print("ErrY:");
display.print(errorY);
display.setCursor(0, 34);
display.print("Dist:");
display.print(distanceFromCenter);
display.setCursor(0, 46);
display.print("P:");
display.print(panAngle);
display.print("/");
display.print((int)currentPanAngle);
display.setCursor(64, 46);
display.print("T:");
display.print(tiltAngle);
display.print("/");
display.print((int)currentTiltAngle);
display.display();
}
bool parseTrackingData(String data) {
data.trim();
int c1 = data.indexOf(',');
int c2 = data.indexOf(',', c1 + 1);
int c3 = data.indexOf(',', c2 + 1);
int c4 = data.indexOf(',', c3 + 1);
int c5 = data.indexOf(',', c4 + 1);
int c6 = data.indexOf(',', c5 + 1);
if (
c1 <= 0 ||
c2 <= c1 ||
c3 <= c2 ||
c4 <= c3 ||
c5 <= c4 ||
c6 <= c5
) {
return false;
}
targetX = data.substring(0, c1).toInt();
targetY = data.substring(c1 + 1, c2).toInt();
errorX = data.substring(c2 + 1, c3).toInt();
errorY = data.substring(c3 + 1, c4).toInt();
distanceFromCenter = data.substring(c4 + 1, c5).toInt();
panAngle = data.substring(c5 + 1, c6).toInt();
tiltAngle = data.substring(c6 + 1).toInt();
panAngle = constrain(panAngle, PAN_MIN_ANGLE, PAN_MAX_ANGLE);
tiltAngle = constrain(tiltAngle, TILT_MIN_ANGLE, TILT_MAX_ANGLE);
return true;
}
void updateServosSmoothly() {
unsigned long now = millis();
if (now - lastServoUpdate < SERVO_UPDATE_INTERVAL_MS) {
return;
}
lastServoUpdate = now;
float panDifference = panAngle - currentPanAngle;
float tiltDifference = tiltAngle - currentTiltAngle;
if (abs(panDifference) < 0.5) {
currentPanAngle = panAngle;
} else {
currentPanAngle = currentPanAngle + (panDifference * PAN_SMOOTHING);
}
if (abs(tiltDifference) < 0.5) {
currentTiltAngle = tiltAngle;
} else {
currentTiltAngle = currentTiltAngle + (tiltDifference * TILT_SMOOTHING);
}
currentPanAngle = constrain(currentPanAngle, PAN_MIN_ANGLE, PAN_MAX_ANGLE);
currentTiltAngle = constrain(currentTiltAngle, TILT_MIN_ANGLE, TILT_MAX_ANGLE);
panServo.write((int)currentPanAngle);
tiltServo.write((int)currentTiltAngle);
}
void updateDisplaySlowly() {
unsigned long now = millis();
if (now - lastDisplayUpdate >= DISPLAY_UPDATE_INTERVAL_MS) {
lastDisplayUpdate = now;
if (hasReceivedData) {
showTrackingData();
} else {
showWaitingScreen();
}
}
}
void setup() {
Serial.begin(115200);
Serial.setTimeout(2);
Wire.begin(SDA_PIN, SCL_PIN);
if (!display.begin(SSD1306_SWITCHCAPVCC, OLED_ADDRESS)) {
while (true) {
delay(100);
}
}
panServo.setPeriodHertz(50);
tiltServo.setPeriodHertz(50);
panServo.attach(PAN_SERVO_PIN, 500, 2400);
tiltServo.attach(TILT_SERVO_PIN, 500, 2400);
panServo.write((int)currentPanAngle);
tiltServo.write((int)currentTiltAngle);
showWaitingScreen();
}
void loop() {
if (Serial.available()) {
String data = Serial.readStringUntil('\n');
if (parseTrackingData(data)) {
hasReceivedData = true;
}
}
updateServosSmoothly();
updateDisplaySlowly();
}
The current prototype successfully detects a red target, calculates its offset from the camera center, converts that offset into pan and tilt angles, sends those values to the ESP32, displays live values on an OLED, and moves both servos in response.