Evaluate Trackers
Measure how well your multi-object tracker performs using standard MOT metrics (CLEAR, HOTA, Identity). Get clear, reproducible scores for development, comparison, and publication.
What you'll learn:
- Evaluate single and multi-sequence tracking results
- Interpret HOTA, MOTA, and IDF1 scores
- Prepare datasets in MOT Challenge format
Installation
For alternative methods, see the Install guide.
Quickstart
Evaluate a single sequence by pointing to your ground-truth and tracker files, then view the results as a formatted table.
Data Format
Ground truth and tracker files use MOT Challenge text format — a simple comma-separated .txt file where each line describes one detection.
Example:
Fields:
frame— Frame number (1-indexed)id— Unique object ID per trackbb_left,bb_top— Top-left bounding box cornerbb_width,bb_height— Bounding box dimensionsconf— Confidence score (1 for ground truth)x,y,z— 3D coordinates (-1 if unused)
Directory Layouts
The evaluator automatically detects whether you're using a flat or MOT-style structure. It also tries to infer benchmark name, split, and tracker name from folder names.
Standard MOT Challenge nested structure.
data/
├── MOT17-train/
│ ├── MOT17-02-FRCNN/
│ │ └── gt/gt.txt
│ ├── MOT17-04-FRCNN/
│ │ └── gt/gt.txt
│ └── MOT17-05-FRCNN/
│ └── gt/gt.txt
└── trackers/
└── MOT17-train/
└── ByteTrack/
└── data/
├── MOT17-02-FRCNN.txt
├── MOT17-04-FRCNN.txt
└── MOT17-05-FRCNN.txt
Python
from trackers.eval import evaluate_mot_sequences
result = evaluate_mot_sequences(
gt_dir="data",
tracker_dir="data/trackers",
benchmark="MOT17",
split="train",
tracker_name="ByteTrack",
)
CLI
One .txt file per sequence, placed directly in the directories.
data/
├── gt/
│ ├── MOT17-02-FRCNN.txt
│ ├── MOT17-04-FRCNN.txt
│ └── MOT17-05-FRCNN.txt
└── trackers/
├── MOT17-02-FRCNN.txt
├── MOT17-04-FRCNN.txt
└── MOT17-05-FRCNN.txt
Python
from trackers.eval import evaluate_mot_sequences
result = evaluate_mot_sequences(
gt_dir="data/gt",
tracker_dir="data/trackers",
)
CLI
Multi-Sequence Evaluation
Run evaluation across many sequences and get both per-sequence results and a combined aggregate.
Output: