Skip to content

SORT

arXiv colab

Overview

SORT (Simple Online and Realtime Tracking) is a lean, tracking-by-detection method that combines a Kalman filter for motion prediction with the Hungarian algorithm for data association. It uses object detections—commonly from a high-performing CNN-based detector—as its input, updating each tracked object’s bounding box based on linear velocity estimates. Because SORT relies on minimal appearance modeling (only bounding box geometry is used), it is extremely fast and can run comfortably at hundreds of frames per second. This speed and simplicity make it well suited for real-time applications in robotics or surveillance, where rapid, approximate solutions are essential. However, its reliance on frame-to-frame matching makes SORT susceptible to ID switches and less robust during long occlusions, since there is no built-in re-identification module.

Examples

import supervision as sv
from trackers import SORTTracker
from inference import get_model

tracker = SORTTracker()
model = get_model(model_id="yolov11m-640")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)

def callback(frame, _):
    result = model.infer(frame)[0]
    detections = sv.Detections.from_inference(result)
    detections = tracker.update(detections)
    return annotator.annotate(frame, detections, labels=detections.tracker_id)

sv.process_video(
    source_path="<INPUT_VIDEO_PATH>",
    target_path="<OUTPUT_VIDEO_PATH>",
    callback=callback,
)
import supervision as sv
from trackers import SORTTracker
from rfdetr import RFDETRBase

tracker = SORTTracker()
model = RFDETRBase()
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)

def callback(frame, _):
    detections = model.predict(frame)
    detections = tracker.update(detections)
    return annotator.annotate(frame, detections, labels=detections.tracker_id)

sv.process_video(
    source_path="<INPUT_VIDEO_PATH>",
    target_path="<OUTPUT_VIDEO_PATH>",
    callback=callback,
)
import supervision as sv
from trackers import SORTTracker
from ultralytics import YOLO

tracker = SORTTracker()
model = YOLO("yolo11m.pt")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)

def callback(frame, _):
    result = model(frame)[0]
    detections = sv.Detections.from_ultralytics(result)
    detections = tracker.update(detections)
    return annotator.annotate(frame, detections, labels=detections.tracker_id)

sv.process_video(
    source_path="<INPUT_VIDEO_PATH>",
    target_path="<OUTPUT_VIDEO_PATH>",
    callback=callback,
)
import torch
import supervision as sv
from trackers import SORTTracker
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor

tracker = SORTTracker()
processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd")
annotator = sv.LabelAnnotator(text_position=sv.Position.CENTER)

def callback(frame, _):
    inputs = processor(images=frame, return_tensors="pt")
    with torch.no_grad():
        outputs = model(**inputs)

    h, w, _ = frame.shape
    results = processor.post_process_object_detection(
        outputs,
        target_sizes=torch.tensor([(h, w)]),
        threshold=0.5
    )[0]

    detections = sv.Detections.from_transformers(
        transformers_results=results,
        id2label=model.config.id2label
    )

    detections = tracker.update(detections)
    return annotator.annotate(frame, detections, labels=detections.tracker_id)

sv.process_video(
    source_path="<INPUT_VIDEO_PATH>",
    target_path="<OUTPUT_VIDEO_PATH>",
    callback=callback,
)

API

trackers.core.sort.tracker.SORTTracker

Bases: BaseTracker

Implements SORT (Simple Online and Realtime Tracking).

SORT is a pragmatic approach to multiple object tracking with a focus on simplicity and speed. It uses a Kalman filter for motion prediction and the Hungarian algorithm or simple IOU matching for data association.

Parameters:

Name Type Description Default
lost_track_buffer int

Number of frames to buffer when a track is lost. Increasing lost_track_buffer enhances occlusion handling, significantly improving tracking through occlusions, but may increase the possibility of ID switching for objects with similar appearance.

30
frame_rate float

Frame rate of the video (frames per second). Used to calculate the maximum time a track can be lost.

30.0
track_activation_threshold float

Detection confidence threshold for track activation. Only detections with confidence above this threshold will create new tracks. Increasing this threshold reduces false positives but may miss real objects with low confidence.

0.25
minimum_consecutive_frames int

Number of consecutive frames that an object must be tracked before it is considered a 'valid' track. Increasing minimum_consecutive_frames prevents the creation of accidental tracks from false detection or double detection, but risks missing shorter tracks. Before the tracker is considered valid, it will be assigned -1 as its tracker_id.

3
minimum_iou_threshold float

IOU threshold for associating detections to existing tracks.

0.3

update(detections)

Updates the tracker state with new detections.

Performs Kalman filter prediction, associates detections with existing trackers based on IOU, updates matched trackers, and initializes new trackers for unmatched high-confidence detections.

Parameters:

Name Type Description Default
detections Detections

The latest set of object detections from a frame.

required

Returns:

Type Description
Detections

sv.Detections: A copy of the input detections, augmented with assigned tracker_id for each successfully tracked object. Detections not associated with a track will not have a tracker_id.

reset()

Resets the tracker's internal state.

Clears all active tracks and resets the track ID counter.

Comments