Skip to content

Tracker Comparison

This page shows head-to-head performance of SORT, ByteTrack, OC-SORT, and BoT-SORT on standard MOT benchmarks. Results are shown with default parameters and with parameter-tuned configurations found via grid search.

Benchmark version

Results use trackers v2.3.0 (released 2026-03-16). Detections are from YOLOX (MOT17, SportsMOT) or ground-truth oracle boxes (SoccerNet, DanceTrack). Parameters were tuned via grid search on held-out splits. See Methodology for details.

Benchmark methodology

Results measured using YOLOX detections (MOT17, SportsMOT) or oracle ground-truth boxes (SoccerNet, DanceTrack) with default and grid-searched parameters. Performance varies across detectors — see Detection Quality Matters for the impact of detector quality on tracking metrics.

MOT17

Pedestrian tracking with crowded scenes and frequent occlusions. Strongly tests re-identification and identity stability.

Visualization of ground-truth annotations for MOT17.

Info

Parameters were tuned on the validation set. Results are reported on the test set via Codabench submission. Detections come from a YOLOX model.

Results using default tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 58.4 69.9 67.2
ByteTrack 60.1 73.2 74.1
OC-SORT 61.9 76.4 76.0
BoT-SORT 63.7 78.7 79.2

Results after grid search over tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 60.4 72.5 75.8
ByteTrack 60.5 72.7 76.1
OC-SORT 62.0 76.5 77.3
BoT-SORT 63.8 78.7 79.4

Tuned configuration for each tracker.

SORT:
  lost_track_buffer: 10
  track_activation_threshold: 0.75
  minimum_consecutive_frames: 2
  minimum_iou_threshold: 0.3

ByteTrack:
  lost_track_buffer: 10
  track_activation_threshold: 0.7
  minimum_consecutive_frames: 1
  minimum_iou_threshold: 0.3
  high_conf_det_threshold: 0.5

OC-SORT:
  lost_track_buffer: 30
  minimum_iou_threshold: 0.3
  minimum_consecutive_frames: 3
  direction_consistency_weight: 0.2
  high_conf_det_threshold: 0.4
  delta_t: 1

BoT-SORT:
  lost_track_buffer: 30
  minimum_consecutive_frames: 2
  minimum_iou_threshold_first_assoc: 0.2
  minimum_iou_threshold_second_assoc: 0.5
  minimum_iou_threshold_unconfirmed_assoc: 0.2
  high_conf_det_threshold: 0.5
  track_activation_threshold: 0.6
  enable_cmc: true
  cmc_method: sparseOptFlow

SportsMOT

Sports broadcast tracking with fast motion, camera pans, and similar-looking targets. Tests association under speed and appearance ambiguity.

Visualization of ground-truth annotations for SportsMOT.

Info

Parameters were tuned on the validation set. Results are reported on the test set via Codabench submission. Detections come from a YOLOX model.

Results using default tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 70.8 68.9 95.5
ByteTrack 73.0 72.5 96.4
OC-SORT 71.7 71.4 95.0
BoT-SORT 73.8 73.4 96.9

Results after grid search over tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 72.9 73.0 95.8
ByteTrack 73.3 73.5 95.9
OC-SORT 74.0 75.4 95.6
BoT-SORT 74.1 74.0 96.9

Tuned configuration for each tracker.

SORT:
  lost_track_buffer: 60
  track_activation_threshold: 0.9
  minimum_consecutive_frames: 2
  minimum_iou_threshold: 0.05

ByteTrack:
  lost_track_buffer: 10
  track_activation_threshold: 0.9
  minimum_consecutive_frames: 1
  minimum_iou_threshold: 0.05
  high_conf_det_threshold: 0.7

OC-SORT:
  lost_track_buffer: 60
  minimum_iou_threshold: 0.1
  minimum_consecutive_frames: 3
  direction_consistency_weight: 0.2
  high_conf_det_threshold: 0.6
  delta_t: 3

BoT-SORT:
  lost_track_buffer: 30
  minimum_consecutive_frames: 2
  minimum_iou_threshold_first_assoc: 0.1
  minimum_iou_threshold_second_assoc: 0.5
  minimum_iou_threshold_unconfirmed_assoc: 0.3
  high_conf_det_threshold: 0.7
  track_activation_threshold: 0.8
  enable_cmc: true
  cmc_method: sparseOptFlow

SoccerNet-tracking

Long sequences with dense interactions and partial occlusions. Tests long-term ID consistency.

Visualization of ground-truth annotations for SoccerNet.

Info

Parameters were tuned on the train set. Results are reported on the test set. SoccerNet-tracking has no validation split. This dataset provides oracle (ground-truth) detections.

Results using default tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 81.6 76.2 95.1
ByteTrack 84.0 78.1 97.8
OC-SORT 78.4 72.6 94.1
BoT-SORT 84.5 79.3 96.6

Results after grid search over tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 84.2 78.2 98.2
ByteTrack 84.0 78.1 98.2
OC-SORT 82.9 77.9 96.8
BoT-SORT 85.0 79.7 97.2

Tuned configuration for each tracker.

SORT:
  lost_track_buffer: 30
  track_activation_threshold: 0.25
  minimum_consecutive_frames: 2
  minimum_iou_threshold: 0.05

ByteTrack:
  lost_track_buffer: 30
  track_activation_threshold: 0.2
  minimum_consecutive_frames: 1
  minimum_iou_threshold: 0.05
  high_conf_det_threshold: 0.5

OC-SORT:
  lost_track_buffer: 60
  minimum_iou_threshold: 0.1
  minimum_consecutive_frames: 3
  direction_consistency_weight: 0.2
  high_conf_det_threshold: 0.4
  delta_t: 1

BoT-SORT:
  lost_track_buffer: 60
  minimum_consecutive_frames: 2
  minimum_iou_threshold_first_assoc: 0.1
  minimum_iou_threshold_second_assoc: 0.6
  minimum_iou_threshold_unconfirmed_assoc: 0.2
  high_conf_det_threshold: 0.6
  track_activation_threshold: 0.7
  enable_cmc: true
  cmc_method: sparseOptFlow

DanceTrack

Group dancing tracking with uniform appearance, diverse motions, and extreme articulation. Tests motion-based association without relying on visual discrimination.

Visualization of ground-truth annotations for DanceTrack.

Warning

DanceTrack test set evaluation is currently unavailable because CodaLab, which hosted the benchmark, has been discontinued. Migration to Codabench is in progress. Results below use the validation set instead.

Info

Parameters were tuned on the train set. Results are reported on the validation set. This dataset provides oracle (ground-truth) detections.

Results using default tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 45.0 39.0 80.6
ByteTrack 50.2 49.9 86.2
OC-SORT 51.8 50.9 87.3
BoT-SORT 50.5 49.2 85.1

Results after grid search over tracker parameters.

Tracker HOTA IDF1 MOTA
SORT 50.6 49.6 84.3
ByteTrack 53.2 54.6 86.8
OC-SORT 52.0 51.8 87.2
BoT-SORT 53.5 54.0 86.5

Tuned configuration for each tracker.

SORT:
  lost_track_buffer: 10
  track_activation_threshold: 0.9
  minimum_consecutive_frames: 2
  minimum_iou_threshold: 0.05

ByteTrack:
  lost_track_buffer: 60
  track_activation_threshold: 0.9
  minimum_consecutive_frames: 1
  minimum_iou_threshold: 0.1
  high_conf_det_threshold: 0.5

OC-SORT:
  lost_track_buffer: 30
  minimum_iou_threshold: 0.1
  minimum_consecutive_frames: 3
  direction_consistency_weight: 0.2
  high_conf_det_threshold: 0.6
  delta_t: 1

BoT-SORT:
  lost_track_buffer: 60
  minimum_consecutive_frames: 2
  minimum_iou_threshold_first_assoc: 0.1
  minimum_iou_threshold_second_assoc: 0.5
  minimum_iou_threshold_unconfirmed_assoc: 0.2
  high_conf_det_threshold: 0.6
  track_activation_threshold: 0.7
  enable_cmc: true
  cmc_method: sparseOptFlow

Methodology

Detections

Each dataset uses one of two detection sources: oracle detections (ground-truth bounding boxes provided by the dataset) or model detections (produced by a YOLOX detector following the ByteTrack procedure). The source is noted per dataset above.

Tuning

Best parameters per tracker and dataset were found via grid search, selecting the configuration with the highest HOTA. Tuning and evaluation always use separate data splits to reflect real-world usage:

  • Train + validation + test: tune on validation, report on test.
  • Train + validation: tune on train, report on validation.
  • Train + test: tune on train, report on test.

When to Use Each Tracker

SORT is the right choice when speed is the primary constraint and scenes are not heavily occluded. Its Kalman filter plus Hungarian matching runs at hundreds of frames per second and produces clean, easy-to-debug results. Use SORT as a baseline before adding more complex trackers, or when deploying on edge devices with tight compute budgets.

ByteTrack is the default recommendation for most applications. It outperforms SORT on all four benchmarks by recovering low-confidence detections that SORT discards. The two-stage association adds almost no extra compute and consistently reduces missed tracks and identity switches. Use ByteTrack when your detector produces noisy or variable-confidence outputs — sports video, aerial footage, and crowded retail scenes all benefit.

OC-SORT is best when camera motion is significant or objects follow non-linear paths. Its observation-centric re-update mechanism and direction consistency cost reduce drift from the linear motion assumption. Use OC-SORT when SORT or ByteTrack loses tracks on fast turns, camera pans, or erratic motion — the benchmark edge on MOT17 and DanceTrack reflects exactly these conditions.

BoT-SORT is the choice when camera ego-motion is strong and you need the most stable identities. It extends ByteTrack with camera motion compensation (CMC) and confidence-aware association, which reduces ID switches on panning or handheld footage. Use BoT-SORT for sports broadcasts, drone video, or any scene where the camera moves frequently. The CMC overhead is small relative to the detector, so the trade-off favors identity stability over raw speed.

Metric Definitions

HOTA (Higher Order Tracking Accuracy) — the primary benchmark metric. HOTA decomposes tracking quality into detection accuracy (DetA) and association accuracy (AssA), then takes their geometric mean. It weights identity consistency equally with detection recall and precision, unlike older metrics that under-penalize fragmented tracks. Higher HOTA indicates both good detection and stable long-term identity.

IDF1 (Identity F1) — measures how long the system correctly identifies each ground-truth object over its lifetime. IDF1 is the harmonic mean of identification precision and identification recall. High IDF1 means tracks stay on the correct identity; low IDF1 means frequent identity switches.

MOTA (Multiple Object Tracking Accuracy) — combines the count of false positives, missed detections, and identity switches into a single score relative to the total number of ground-truth objects. MOTA is dominated by detection recall and precision; a detector with near-perfect recall produces high MOTA even when identity switches are frequent.