SORT
Overview
SORT is a classic online, tracking-by-detection method that predicts object motion with a Kalman filter and matches predicted tracks to detections using the Hungarian algorithm based on Intersection over Union (IoU). The tracker uses only geometric cues from bounding boxes, without appearance features, so it runs extremely fast and scales to hundreds of frames per second on typical hardware. Detections from a strong CNN detector feed SORT, which updates each track’s state via a constant velocity motion model and prunes stale tracks. Because SORT lacks explicit re-identification or appearance cues, it can suffer identity switches and fragmented tracks under long occlusions or heavy crowding.
Benchmarks
For comparisons with other trackers, plus full details on the datasets and evaluation metrics used, see the benchmarks page.
| Dataset | HOTA | IDF1 | MOTA |
|---|---|---|---|
| MOT17 | 58.4 | 69.9 | 67.2 |
| SportsMOT | 70.9 | 68.9 | 95.7 |
| SoccerNet | 81.6 | 76.2 | 95.1 |
Run on video, webcam, or RTSP stream
These examples use OpenCV for decoding and display. Replace <SOURCE_VIDEO_PATH>, <WEBCAM_INDEX>, and <RTSP_STREAM_URL> with your inputs. <WEBCAM_INDEX> is usually 0 for the default camera.
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import SORTTracker
tracker = SORTTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<SOURCE_VIDEO_PATH>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open video source")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + SORT", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import SORTTracker
tracker = SORTTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<WEBCAM_INDEX>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open webcam")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + SORT", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import SORTTracker
tracker = SORTTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<RTSP_STREAM_URL>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open RTSP stream")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + SORT", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
API
trackers.core.sort.tracker.SORTTracker
Bases: BaseTracker
Implements SORT (Simple Online and Realtime Tracking).
SORT is a pragmatic approach to multiple object tracking with a focus on simplicity and speed. It uses a Kalman filter for motion prediction and the Hungarian algorithm or simple IOU matching for data association.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lost_track_buffer
|
int
|
Number of frames to buffer when a track is lost. Increasing lost_track_buffer enhances occlusion handling, significantly improving tracking through occlusions, but may increase the possibility of ID switching for objects with similar appearance. |
30
|
frame_rate
|
float
|
Frame rate of the video (frames per second). Used to calculate the maximum time a track can be lost. |
30.0
|
track_activation_threshold
|
float
|
Detection confidence threshold for track activation. Only detections with confidence above this threshold will create new tracks. Increasing this threshold reduces false positives but may miss real objects with low confidence. |
0.25
|
minimum_consecutive_frames
|
int
|
Number of consecutive frames that an object
must be tracked before it is considered a 'valid' track. Increasing
|
3
|
minimum_iou_threshold
|
float
|
IOU threshold for associating detections to existing tracks. |
0.3
|
update(detections)
Updates the tracker state with new detections.
Performs Kalman filter prediction, associates detections with existing trackers based on IOU, updates matched trackers, and initializes new trackers for unmatched high-confidence detections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
detections
|
Detections
|
The latest set of object detections from a frame. |
required |
Returns:
| Type | Description |
|---|---|
Detections
|
A copy of the input detections, augmented with assigned |
reset()
Resets the tracker's internal state.
Clears all active tracks and resets the track ID counter.