ByteTrack
What is ByteTrack?
ByteTrack builds on the same Kalman filter plus Hungarian algorithm framework as SORT but changes the data association strategy to use almost every detection box regardless of confidence score. It runs a two-stage matching: first match high-confidence detections to tracks, then match low-confidence detections to any unmatched tracks using IoU. This reduces missed tracks and fragmentation for occluded or weak detections while retaining simplicity and high frame rates. ByteTrack has set state-of-the-art results on standard MOT benchmarks with real-time performance, because it recovers valid low-score detections instead of discarding them.
How does ByteTrack compare to other trackers?
For comparisons with other trackers, plus dataset context and evaluation details, see the tracker comparison page.
| Dataset | HOTA | IDF1 | MOTA |
|---|---|---|---|
| MOT17 | 60.1 | 73.2 | 74.1 |
| SportsMOT | 73.0 | 72.5 | 96.4 |
| SoccerNet | 84.0 | 78.1 | 97.8 |
How does ByteTrack work?
ByteTrack builds on the same Kalman filter and Hungarian algorithm framework as SORT but changes how detections are associated to tracks. Instead of discarding low-confidence detections, ByteTrack uses a two-stage matching strategy that recovers valid objects the detector scored low due to occlusion, blur, or partial visibility.
Stage 1 -- high-confidence matching. Detections with confidence above high_conf_det_threshold are matched to confirmed tracks using IoU-based Hungarian assignment, identical to SORT. Unmatched tracks and unmatched high-confidence detections pass to the next stage.
Stage 2 -- low-confidence matching. Detections with confidence between track_activation_threshold and high_conf_det_threshold are matched to the remaining unmatched tracks using IoU. This second pass associates weak detections to already-established tracks, recovering objects that would otherwise be lost. Detections below track_activation_threshold are discarded entirely and never start new tracks.
Track lifecycle. New tracks are initialized only from unmatched high-confidence detections (stage 1). A new track is promoted to confirmed status after minimum_consecutive_frames consecutive matches. Tracks that go unmatched for more than lost_track_buffer frames are deleted.
Key insight. Discarding low-confidence detections outright loses genuinely valid objects that happen to have a low score in one or a few frames. ByteTrack recaptures these by associating them with tracks that already have an established identity and motion history, rather than treating them as new objects. This produces fewer missed tracks and fewer ID switches with almost no additional computation over SORT.
Key Parameters
| Parameter | Purpose | Tuning guidance |
|---|---|---|
lost_track_buffer |
Frames to keep an unmatched track alive before deletion. | Higher tolerates longer occlusions but risks false re-association. 10-30 for most scenes; up to 60 for very long occlusions. |
track_activation_threshold |
Minimum detection confidence to use in any matching stage. | Higher reduces spurious tracks; lower catches weak detections. 0.5-0.9 typical. |
minimum_consecutive_frames |
Consecutive detections required to confirm a new track. | 1 confirms immediately; 2-3 filters out single-frame false positives. |
minimum_iou_threshold |
Minimum IoU to accept a track-detection match. | Lower associates through more displacement between frames. 0.1-0.3 typical. |
high_conf_det_threshold |
Confidence threshold separating stage-1 from stage-2 detections. | 0.5-0.7 typical. Lower sends more detections to stage 1; higher relies more on stage-2 recovery. |
Run on video, webcam, or RTSP stream
These examples use opencv-python for decoding and display. Replace <SOURCE_VIDEO_PATH>, <WEBCAM_INDEX>, and <RTSP_STREAM_URL> with your inputs. <WEBCAM_INDEX> is usually 0 for the default camera.
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<SOURCE_VIDEO_PATH>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open video source")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(
annotated_frame,
detections,
labels=detections.tracker_id,
)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<WEBCAM_INDEX>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open webcam")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(
annotated_frame,
detections,
labels=detections.tracker_id,
)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
import cv2
import supervision as sv
from rfdetr import RFDETRMedium
from trackers import ByteTrackTracker
tracker = ByteTrackTracker()
model = RFDETRMedium()
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
video_capture = cv2.VideoCapture("<RTSP_STREAM_URL>")
if not video_capture.isOpened():
raise RuntimeError("Failed to open RTSP stream")
while True:
success, frame_bgr = video_capture.read()
if not success:
break
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
detections = model.predict(frame_rgb)
detections = tracker.update(detections)
annotated_frame = box_annotator.annotate(frame_bgr, detections)
annotated_frame = label_annotator.annotate(
annotated_frame,
detections,
labels=detections.tracker_id,
)
cv2.imshow("RF-DETR + ByteTrack", annotated_frame)
if cv2.waitKey(1) & 0xFF == ord("q"):
break
video_capture.release()
cv2.destroyAllWindows()
Reference
Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., and Wang, X. (2022). ByteTrack: Multi-Object Tracking by Associating Every Detection Box. ECCV. arXiv:2110.06864