SORT
Overview
SORT (Simple Online and Realtime Tracking) is a lean, tracking-by-detection method that combines a Kalman filter for motion prediction with the Hungarian algorithm for data association. It uses object detections—commonly from a high-performing CNN-based detector—as its input, updating each tracked object’s bounding box based on linear velocity estimates. Because SORT relies on minimal appearance modeling (only bounding box geometry is used), it is extremely fast and can run comfortably at hundreds of frames per second. This speed and simplicity make it well suited for real-time applications in robotics or surveillance, where rapid, approximate solutions are essential. However, its reliance on frame-to-frame matching makes SORT susceptible to ID switches and less robust during long occlusions, since there is no built-in re-identification module.
Examples
import supervision as sv
from trackers import SORTTracker
from inference import get_model
tracker = SORTTracker()
model = get_model(model_id="yolov11m-640")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)
def callback(frame, _):
result = model.infer(frame)[0]
detections = sv.Detections.from_inference(result)
detections = tracker.update(detections)
return annotator.annotate(frame, detections, labels=detections.tracker_id)
sv.process_video(
source_path="<INPUT_VIDEO_PATH>",
target_path="<OUTPUT_VIDEO_PATH>",
callback=callback,
)
import supervision as sv
from trackers import SORTTracker
from rfdetr import RFDETRBase
tracker = SORTTracker()
model = RFDETRBase()
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)
def callback(frame, _):
detections = model.predict(frame)
detections = tracker.update(detections)
return annotator.annotate(frame, detections, labels=detections.tracker_id)
sv.process_video(
source_path="<INPUT_VIDEO_PATH>",
target_path="<OUTPUT_VIDEO_PATH>",
callback=callback,
)
import supervision as sv
from trackers import SORTTracker
from ultralytics import YOLO
tracker = SORTTracker()
model = YOLO("yolo11m.pt")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)
def callback(frame, _):
result = model(frame)[0]
detections = sv.Detections.from_ultralytics(result)
detections = tracker.update(detections)
return annotator.annotate(frame, detections, labels=detections.tracker_id)
sv.process_video(
source_path="<INPUT_VIDEO_PATH>",
target_path="<OUTPUT_VIDEO_PATH>",
callback=callback,
)
import torch
import supervision as sv
from trackers import SORTTracker
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
tracker = SORTTracker()
processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)
def callback(frame, _):
inputs = processor(images=frame, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
h, w, _ = frame.shape
results = processor.post_process_object_detection(
outputs,
target_sizes=torch.tensor([(h, w)]),
threshold=0.5
)[0]
detections = sv.Detections.from_transformers(
transformers_results=results,
id2label=model.config.id2label
)
detections = tracker.update(detections)
return annotator.annotate(frame, detections, labels=detections.tracker_id)
sv.process_video(
source_path="<INPUT_VIDEO_PATH>",
target_path="<OUTPUT_VIDEO_PATH>",
callback=callback,
)
API
trackers.core.sort.tracker.SORTTracker
Bases: BaseTracker
Implements SORT (Simple Online and Realtime Tracking).
SORT is a pragmatic approach to multiple object tracking with a focus on simplicity and speed. It uses a Kalman filter for motion prediction and the Hungarian algorithm or simple IOU matching for data association.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lost_track_buffer
|
int
|
Number of frames to buffer when a track is lost. Increasing lost_track_buffer enhances occlusion handling, significantly improving tracking through occlusions, but may increase the possibility of ID switching for objects with similar appearance. |
30
|
frame_rate
|
float
|
Frame rate of the video (frames per second). Used to calculate the maximum time a track can be lost. |
30.0
|
track_activation_threshold
|
float
|
Detection confidence threshold for track activation. Only detections with confidence above this threshold will create new tracks. Increasing this threshold reduces false positives but may miss real objects with low confidence. |
0.25
|
minimum_consecutive_frames
|
int
|
Number of consecutive frames that an object
must be tracked before it is considered a 'valid' track. Increasing
|
3
|
minimum_iou_threshold
|
float
|
IOU threshold for associating detections to existing tracks. |
0.3
|
update(detections)
Updates the tracker state with new detections.
Performs Kalman filter prediction, associates detections with existing trackers based on IOU, updates matched trackers, and initializes new trackers for unmatched high-confidence detections.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
detections
|
Detections
|
The latest set of object detections from a frame. |
required |
Returns:
Type | Description |
---|---|
Detections
|
sv.Detections: A copy of the input detections, augmented with assigned
|
reset()
Resets the tracker's internal state.
Clears all active tracks and resets the track ID counter.