ByteTrack
Overview
ByteTrack builds on the same Kalman filter plus Hungarian algorithm framework as SORT but changes the data association strategy to use almost every detection box regardless of confidence score. It runs a two-stage matching: first match high-confidence detections to tracks, then match low-confidence detections to any unmatched tracks using IoU. This reduces missed tracks and fragmentation for occluded or weak detections while retaining simplicity and high frame rates. ByteTrack has set state-of-the-art results on standard MOT benchmarks with real-time performance, because it recovers valid low-score detections instead of discarding them.
Benchmarks
For comparisons with other trackers, plus full details on the datasets and evaluation metrics used, see the benchmarks page.
| Dataset | HOTA | IDF1 | MOTA |
|---|---|---|---|
| MOT17 | 60.1 | 73.2 | 74.1 |
| SportsMOT | 73.0 | 72.5 | 96.4 |
| SoccerNet | 84.0 | 78.1 | 97.8 |
Run on video, webcam, or RTSP stream
These examples use OpenCV for decoding and display. Replace <SOURCE_VIDEO_PATH>, <WEBCAM_INDEX>, and <RTSP_STREAM_URL> with your inputs. <WEBCAM_INDEX> is usually 0 for the default camera.
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<SOURCE_VIDEO_PATH>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open video source")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<WEBCAM_INDEX>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open webcam")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<RTSP_STREAM_URL>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open RTSP stream")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(annotated_frame, detections, labels=detections.tracker_id)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
API
trackers.core.bytetrack.tracker.ByteTrackTracker
Bases: BaseTracker
Implements ByteTrack.
ByteTrack is a simple, effective, and generic multi-object tracking method that improves upon tracking-by-detection by associating every detection box instead of discarding low-score ones. This makes it more robust to occlusions. It uses a two-stage association process and builds on established techniques like the Kalman Filter for motion prediction and the Hungarian algorithm for data association.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lost_track_buffer
|
int
|
Number of frames to buffer when a track is lost. Increasing lost_track_buffer enhances occlusion handling, significantly improving tracking through occlusions, but may increase the possibility of ID switching for objects that disappear. |
30
|
frame_rate
|
float
|
Frame rate of the video (frames per second). Used to calculate the maximum time a track can be lost. |
30.0
|
track_activation_threshold
|
float
|
Detection confidence threshold for track activation. Only detections with confidence above this threshold will create new tracks. Increasing this threshold may reduce false positives but may miss real objects with low confidence. |
0.7
|
minimum_consecutive_frames
|
int
|
Number of consecutive frames that an object
must be tracked before it is considered a 'valid'/'active/ track. Increasing
|
2
|
minimum_iou_threshold
|
float
|
IoU threshold for associating detections to existing tracks. Prevents the association of lower IoU than the threshold between boxes and tracks. A higher value will only associate boxes that have more overlapping area. |
0.1
|
high_conf_det_threshold
|
float
|
threshold for assigning detections to high probability class. A higher value will classify only higher confidence/probability detections as 'high probability' per the ByteTrack algorithm, which are used in the first similarity step of the algorithm. |
0.6
|
update(detections)
Updates the tracker state with new detections.
Performs Kalman Filter prediction, associates detections with existing tracks based on IoU, updates matched tracks, and initializes new tracks for unmatched high-confidence detections.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
detections
|
Detections
|
The latest set of object detections from a frame. |
required |
Returns:
| Type | Description |
|---|---|
Detections
|
A copy of the input detections, augmented with assigned |
reset()
Resets the tracker's internal state.
Clears all active tracks and resets the track ID counter.