Evals API
trackers.eval.evaluate.evaluate_mot_sequence(gt_path, tracker_path, metrics=None, threshold=0.5)
Evaluate a single multi-object tracking result against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) for one sequence by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union).
TrackEval parity
This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gt_path
|
str | Path
|
Path to the ground-truth MOT file. |
required |
tracker_path
|
str | Path
|
Path to the tracker MOT file. |
required |
metrics
|
list[str] | None
|
Metric families to compute. Supported values are
|
None
|
threshold
|
float
|
IoU threshold for |
0.5
|
Returns:
| Type | Description |
|---|---|
SequenceResult
|
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If |
ValueError
|
If an unsupported metric family is requested. |
Examples:
>>> from trackers.eval import evaluate_mot_sequence
>>> result = evaluate_mot_sequence(
... gt_path="data/gt/MOT17-02/gt.txt",
... tracker_path="data/trackers/MOT17-02.txt",
... metrics=["CLEAR", "HOTA", "Identity"],
... )
>>> print(result.CLEAR.MOTA)
# 0.756
>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence MOTA HOTA IDF1 IDSW
# -------------------------------------------------------------
# MOT17-02 75.600 62.300 72.100 42
trackers.eval.evaluate.evaluate_mot_sequences(gt_dir, tracker_dir, seqmap=None, metrics=None, threshold=0.5, benchmark=None, split=None, tracker_name=None)
Evaluate multiple multi-object tracking results against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) across one or more sequences by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union). Returns both per-sequence and aggregated (combined) results.
TrackEval parity
This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.
Supported dataset layouts
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gt_dir
|
str | Path
|
Directory with ground-truth files. |
required |
tracker_dir
|
str | Path
|
Directory with tracker prediction files. |
required |
seqmap
|
str | Path | None
|
Optional sequence map. If provided, only those sequences are evaluated. |
None
|
metrics
|
list[str] | None
|
Metric families to compute. Supported values are
|
None
|
threshold
|
float
|
IoU threshold for |
0.5
|
benchmark
|
str | None
|
Override auto-detected benchmark name (e.g., |
None
|
split
|
str | None
|
Override auto-detected split name (e.g., |
None
|
tracker_name
|
str | None
|
Override auto-detected tracker name. |
None
|
Returns:
| Type | Description |
|---|---|
BenchmarkResult
|
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If |
ValueError
|
If auto-detection finds multiple valid options. |
Examples:
Auto-detect layout and evaluate all sequences:
>>> from trackers.eval import evaluate_mot_sequences
>>> result = evaluate_mot_sequences(
... gt_dir="data/gt/",
... tracker_dir="data/trackers/",
... metrics=["CLEAR", "HOTA", "Identity"],
... )
>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence MOTA HOTA IDF1 IDSW
# -------------------------------------------------------------
# sequence1 74.800 60.900 71.200 37
# sequence2 76.100 63.200 72.500 45
# -------------------------------------------------------------
# COMBINED 75.450 62.050 71.850 82
trackers.eval.results.SequenceResult
dataclass
Result for a single sequence evaluation.
Attributes:
| Name | Type | Description |
|---|---|---|
sequence |
str
|
Name of the sequence. |
CLEAR |
CLEARMetrics | None
|
CLEAR metrics for this sequence, or |
HOTA |
HOTAMetrics | None
|
HOTA metrics for this sequence, or |
Identity |
IdentityMetrics | None
|
Identity metrics for this sequence, or |
from_dict(data)
classmethod
Create SequenceResult from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with sequence name and metrics. |
required |
Returns:
| Type | Description |
|---|---|
SequenceResult
|
|
to_dict()
Convert to dictionary representation.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with all metric values. |
json(indent=2)
Serialize to JSON string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indent
|
int
|
Indentation level for formatting. Defaults to |
2
|
Returns:
| Type | Description |
|---|---|
str
|
JSON string representation. |
table(columns=None)
Format as a table string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[str] | None
|
Metric columns to include. If |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted table string. |
trackers.eval.results.BenchmarkResult
dataclass
Result for multi-sequence evaluation.
Attributes:
| Name | Type | Description |
|---|---|---|
sequences |
dict[str, SequenceResult]
|
Dictionary mapping sequence names to their results. |
aggregate |
SequenceResult
|
Combined metrics across all sequences. |
from_dict(data)
classmethod
Create BenchmarkResult from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with sequences and aggregate results. |
required |
Returns:
| Type | Description |
|---|---|
BenchmarkResult
|
|
to_dict()
Convert to dictionary representation.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with all metric values. |
json(indent=2)
Serialize to JSON string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
indent
|
int
|
Indentation level for formatting. Defaults to |
2
|
Returns:
| Type | Description |
|---|---|
str
|
JSON string representation. |
table(columns=None)
Format as a table string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
columns
|
list[str] | None
|
Metric columns to include. If |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted table string. |
save(path)
Save to a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Destination file path. |
required |
load(path)
classmethod
Load from a JSON file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Source file path. |
required |
Returns:
| Type | Description |
|---|---|
BenchmarkResult
|
|
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the file does not exist. |
trackers.eval.results.CLEARMetrics
dataclass
CLEAR metrics with TrackEval-compatible field names. Float metrics are stored as fractions (0-1 range), not percentages. The values follow the original CLEAR MOT definitions.
Attributes:
| Name | Type | Description |
|---|---|---|
MOTA |
float
|
Multiple Object Tracking Accuracy. Penalizes false negatives,
false positives, and ID switches: |
MOTP |
float
|
Multiple Object Tracking Precision. Mean IoU of matched pairs. Measures localization quality only. |
MODA |
float
|
Multiple Object Detection Accuracy. Like MOTA but ignores ID
switches: |
CLR_Re |
float
|
CLEAR recall. Fraction of GT detections matched:
|
CLR_Pr |
float
|
CLEAR precision. Fraction of tracker detections correct:
|
MTR |
float
|
Mostly tracked ratio. Fraction of GT tracks tracked for >80% of their lifespan. |
PTR |
float
|
Partially tracked ratio. Fraction of GT tracks tracked for 20-80%. |
MLR |
float
|
Mostly lost ratio. Fraction of GT tracks tracked for <20%. |
sMOTA |
float
|
Summed MOTA. Replaces TP count with IoU sum:
|
CLR_TP |
int
|
True positives. Number of correct matches. |
CLR_FN |
int
|
False negatives. Number of missed GT detections. |
CLR_FP |
int
|
False positives. Number of spurious tracker detections. |
IDSW |
int
|
ID switches. Times a GT track changes its matched tracker ID. |
MT |
int
|
Mostly tracked count. Number of GT tracks tracked >80%. |
PT |
int
|
Partially tracked count. Number of GT tracks tracked 20-80%. |
ML |
int
|
Mostly lost count. Number of GT tracks tracked <20%. |
Frag |
int
|
Fragmentations. Times a tracked GT becomes untracked then tracked again. |
MOTP_sum |
float
|
Raw IoU sum for aggregation across sequences. |
CLR_Frames |
int
|
Number of frames evaluated. |
from_dict(data)
classmethod
Create CLEARMetrics from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with metric values. |
required |
Returns:
| Type | Description |
|---|---|
CLEARMetrics
|
|
to_dict()
Convert to dictionary representation.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with all metric values. |
trackers.eval.results.HOTAMetrics
dataclass
HOTA metrics with TrackEval-compatible field names. HOTA evaluates both detection quality and association quality. Float metrics are stored as fractions (0-1 range).
Attributes:
| Name | Type | Description |
|---|---|---|
HOTA |
float
|
Higher Order Tracking Accuracy. Geometric mean of DetA and AssA, averaged over 19 IoU thresholds (0.05 to 0.95). |
DetA |
float
|
Detection accuracy: |
AssA |
float
|
Association accuracy for matched detections over time. |
DetRe |
float
|
Detection recall: |
DetPr |
float
|
Detection precision: |
AssRe |
float
|
Association recall. For each GT ID, measures how consistently it maps to a single tracker ID across time. |
AssPr |
float
|
Association precision. For each tracker ID, measures how consistently it maps to a single GT ID across time. |
LocA |
float
|
Localization accuracy. Mean IoU for matched pairs. |
OWTA |
float
|
Open World Tracking Accuracy. |
HOTA_TP |
int
|
True positive count summed over all 19 thresholds. |
HOTA_FN |
int
|
False negative count summed over all 19 thresholds. |
HOTA_FP |
int
|
False positive count summed over all 19 thresholds. |
from_dict(data)
classmethod
Create HOTAMetrics from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with metric values. |
required |
Returns:
| Type | Description |
|---|---|
HOTAMetrics
|
|
to_dict(include_arrays=False, arrays_as_list=True)
Convert to dictionary representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_arrays
|
bool
|
Whether to include per-alpha arrays. Defaults to |
False
|
arrays_as_list
|
bool
|
Whether to convert arrays to lists for JSON serialization.
Defaults to |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with all metric values. |
trackers.eval.results.IdentityMetrics
dataclass
Identity metrics with TrackEval-compatible field names. Identity metrics measure global ID consistency using an optimal one-to-one assignment between GT and tracker IDs across the full sequence.
Attributes:
| Name | Type | Description |
|---|---|---|
IDF1 |
float
|
ID F1 score. Harmonic mean of IDR and IDP, the primary identity metric. |
IDR |
float
|
ID recall. |
IDP |
float
|
ID precision. |
IDTP |
int
|
ID true positives. Detections matched with globally consistent IDs. |
IDFN |
int
|
ID false negatives. GT detections not matched or matched to the wrong global ID. |
IDFP |
int
|
ID false positives. Tracker detections not matched or matched to the wrong global ID. |
from_dict(data)
classmethod
Create IdentityMetrics from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary with metric values. |
required |
Returns:
| Type | Description |
|---|---|
IdentityMetrics
|
|
to_dict()
Convert to dictionary representation.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with all metric values. |