Skip to content

BoT-SORT

Overview

BoT-SORT extends ByteTrack with camera motion compensation (CMC) to handle moving cameras and dynamic scenes. It keeps ByteTrack's two-stage association strategy (high-confidence matching followed by low-confidence recovery), but first applies a frame-to-frame geometric transform estimated from optical flow so predictions are compared in the correct camera coordinate frame. This reduces missed matches and ID-switches when camera ego-motion causes apparent object jumps. BoT-SORT also combines IoU similarity with detection confidence during association and uses stricter track confirmation logic for more stable identities.

How does BoT-SORT compare to other trackers?

For comparisons with other trackers, plus dataset context and evaluation details, see the tracker comparison page.

Dataset HOTA IDF1 MOTA
MOT17 63.7 78.7 79.2
SportsMOT 73.8 73.4 96.9
SoccerNet 84.5 79.3 96.6

Watch It in Action

Algorithm

BoT-SORT keeps the same tracking-by-detection backbone as ByteTrack but adds camera-motion-aware prediction and confidence-aware association.

CMC (Camera Motion Compensation). Before data association, BoT-SORT estimates global camera motion between consecutive frames (typically from sparse optical flow) and warps each track's Kalman-predicted box into the current frame. Without this step, a panning or moving camera can make stationary or slow-moving targets appear to jump, degrading IoU overlap and causing false unmatched tracks.

Two-stage association. BoT-SORT performs ByteTrack-style matching in two passes: high-confidence detections first, then lower-confidence detections for unmatched tracks. This recovers objects that are briefly weakly scored due to blur, occlusion, or scale change.

Confidence-aware matching. Association costs blend geometric overlap (IoU) with detection confidence so that stronger detections are preferred when multiple matches are plausible.

Track lifecycle. New tracks are initiated and confirmed with a conservative policy (minimum_consecutive_frames) to reduce one-frame false positives. Tracks that remain unmatched longer than lost_track_buffer are removed.

Key Parameters

Parameter Purpose Tuning guidance
lost_track_buffer Frames to keep an unmatched track alive before deletion. Higher tolerates longer occlusions/camera shake but can increase false re-association. 10-30 common; up to 60 for long gaps.
track_activation_threshold Minimum detection confidence required to start a new track. Higher reduces noisy track creation; lower retains harder objects. 0.5-0.9 typical depending on detector quality. This does not control low-confidence association, which still discards detections at a fixed 0.1 confidence floor.
minimum_consecutive_frames Consecutive matches required before confirming a new track. 1 for immediate activation; 2-3 improves robustness against flicker and false positives.
minimum_iou_threshold_first_assoc Minimum IoU for the first association pass with high-confidence detections. Lower helps maintain matches under fast motion or imperfect compensation; higher is stricter and reduces risky matches.
minimum_iou_threshold_second_assoc Minimum IoU for the second association pass with lower-confidence detections. Usually set lower than the first-pass threshold to recover weak detections without over-matching.
minimum_iou_threshold_unconfirmed_assoc Minimum IoU when associating unconfirmed tracks. Higher values make tentative tracks harder to confirm spuriously; lower values help short-lived or noisy objects survive.
high_conf_det_threshold Confidence split between stage-1 and stage-2 detections. 0.5-0.7 common. Higher shifts more detections to recovery stage; lower gives stage-1 broader coverage.
enable_cmc Enables camera motion compensation before association. Keep enabled for moving-camera footage (sports, drone, handheld). Disable mainly for static cameras if you need maximal speed.

Run on video, webcam, or RTSP stream

These examples use opencv-python for decoding and display. Replace <SOURCE_VIDEO_PATH>, <WEBCAM_INDEX>, and <RTSP_STREAM_URL> with your inputs. <WEBCAM_INDEX> is usually 0 for the default camera.

Tip

Pass the current video frame as tracker.update(detections, frame=frame_bgr) to enable Camera Motion Compensation.

import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import BoTSORTTracker

tracker = BoTSORTTracker()
model = RFDETRMedium()

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

video_capture = cv2.VideoCapture("<SOURCE_VIDEO_PATH>")
if not video_capture.isOpened():
    raise RuntimeError("Failed to open video source")

while True:
    success, frame_bgr = video_capture.read()
    if not success:
        break

    frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
    detections = model.predict(frame_rgb)
    detections = tracker.update(detections, frame=frame_bgr)

    annotated_frame = box_annotator.annotate(frame_bgr, detections)
    annotated_frame = label_annotator.annotate(
        annotated_frame,
        detections,
        labels=detections.tracker_id,
    )

    cv2.imshow("RF-DETR + BoT-SORT", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers.core.botsort import BoTSORTTracker

tracker = BoTSORTTracker()
model = RFDETRMedium()

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

video_capture = cv2.VideoCapture("<WEBCAM_INDEX>")
if not video_capture.isOpened():
    raise RuntimeError("Failed to open webcam")

while True:
    success, frame_bgr = video_capture.read()
    if not success:
        break

    frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
    detections = model.predict(frame_rgb)
    detections = tracker.update(detections, frame=frame_bgr)

    annotated_frame = box_annotator.annotate(frame_bgr, detections)
    annotated_frame = label_annotator.annotate(
        annotated_frame,
        detections,
        labels=detections.tracker_id,
    )

    cv2.imshow("RF-DETR + BoT-SORT", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers.core.botsort import BoTSORTTracker

tracker = BoTSORTTracker()
model = RFDETRMedium()

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

video_capture = cv2.VideoCapture("<RTSP_STREAM_URL>")
if not video_capture.isOpened():
    raise RuntimeError("Failed to open RTSP stream")

while True:
    success, frame_bgr = video_capture.read()
    if not success:
        break

    frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
    detections = model.predict(frame_rgb)
    detections = tracker.update(detections, frame=frame_bgr)

    annotated_frame = box_annotator.annotate(frame_bgr, detections)
    annotated_frame = label_annotator.annotate(
        annotated_frame,
        detections,
        labels=detections.tracker_id,
    )

    cv2.imshow("RF-DETR + BoT-SORT", annotated_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

video_capture.release()
cv2.destroyAllWindows()

Reference

Aharon, N., Orfaig, R., and Bobrovsky, B.-Z. (2023). BoT-SORT: Robust Associations Multi-Pedestrian Tracking. arXiv:2206.14651

Comments