Skip to content

Evals API

trackers.eval.evaluate.evaluate_mot_sequence(gt_path, tracker_path, metrics=None, threshold=0.5)

Evaluate a single multi-object tracking result against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) for one sequence by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union).

TrackEval parity

This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.

Parameters:

Name Type Description Default
gt_path str | Path

Path to the ground-truth MOT file.

required
tracker_path str | Path

Path to the tracker MOT file.

required
metrics list[str] | None

Metric families to compute. Supported values are ["CLEAR", "HOTA", "Identity"]. Defaults to ["CLEAR"].

None
threshold float

IoU threshold for CLEAR and Identity matching. Defaults to 0.5. HOTA evaluates across multiple thresholds internally.

0.5

Returns:

Type Description
SequenceResult

SequenceResult with CLEAR, HOTA, and/or Identity populated based on metrics.

Raises:

Type Description
FileNotFoundError

If gt_path or tracker_path does not exist.

ValueError

If an unsupported metric family is requested.

Examples:

>>> from trackers.eval import evaluate_mot_sequence

>>> result = evaluate_mot_sequence(
...     gt_path="data/gt/MOT17-02/gt.txt",
...     tracker_path="data/trackers/MOT17-02.txt",
...     metrics=["CLEAR", "HOTA", "Identity"],
... )

>>> print(result.CLEAR.MOTA)
# 0.756

>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence                           MOTA    HOTA    IDF1  IDSW
# -------------------------------------------------------------
# MOT17-02                         75.600  62.300  72.100    42

trackers.eval.evaluate.evaluate_mot_sequences(gt_dir, tracker_dir, seqmap=None, metrics=None, threshold=0.5, benchmark=None, split=None, tracker_name=None)

Evaluate multiple multi-object tracking results against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) across one or more sequences by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union). Returns both per-sequence and aggregated (combined) results.

TrackEval parity

This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.

Supported dataset layouts

gt_dir/
└── MOT17-train/
    ├── MOT17-02-FRCNN/
    │   └── gt/gt.txt
    ├── MOT17-04-FRCNN/
    │   └── gt/gt.txt
    ├── MOT17-05-FRCNN/
    │   └── gt/gt.txt
    └── ...

tracker_dir/
└── MOT17-train/
    └── ByteTrack/
        └── data/
            ├── MOT17-02-FRCNN.txt
            ├── MOT17-04-FRCNN.txt
            ├── MOT17-05-FRCNN.txt
            └── ...
gt_dir/
├── MOT17-02.txt
├── MOT17-04.txt
├── MOT17-05.txt
└── ...

tracker_dir/
├── MOT17-02.txt
├── MOT17-04.txt
├── MOT17-05.txt
└── ...

Parameters:

Name Type Description Default
gt_dir str | Path

Directory with ground-truth files.

required
tracker_dir str | Path

Directory with tracker prediction files.

required
seqmap str | Path | None

Optional sequence map. If provided, only those sequences are evaluated.

None
metrics list[str] | None

Metric families to compute. Supported values are ["CLEAR", "HOTA", "Identity"]. Defaults to ["CLEAR"].

None
threshold float

IoU threshold for CLEAR and Identity. Defaults to 0.5.

0.5
benchmark str | None

Override auto-detected benchmark name (e.g., "MOT17").

None
split str | None

Override auto-detected split name (e.g., "train", "val").

None
tracker_name str | None

Override auto-detected tracker name.

None

Returns:

Type Description
BenchmarkResult

BenchmarkResult with per-sequence results and a COMBINED aggregate.

Raises:

Type Description
FileNotFoundError

If gt_dir or tracker_dir does not exist.

ValueError

If auto-detection finds multiple valid options.

Examples:

Auto-detect layout and evaluate all sequences:

>>> from trackers.eval import evaluate_mot_sequences

>>> result = evaluate_mot_sequences(
...     gt_dir="data/gt/",
...     tracker_dir="data/trackers/",
...     metrics=["CLEAR", "HOTA", "Identity"],
... )

>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence                           MOTA    HOTA    IDF1  IDSW
# -------------------------------------------------------------
# sequence1                        74.800  60.900  71.200    37
# sequence2                        76.100  63.200  72.500    45
# -------------------------------------------------------------
# COMBINED                         75.450  62.050  71.850    82

trackers.eval.results.SequenceResult dataclass

Result for a single sequence evaluation.

Attributes:

Name Type Description
sequence str

Name of the sequence.

CLEAR CLEARMetrics | None

CLEAR metrics for this sequence, or None if not requested.

HOTA HOTAMetrics | None

HOTA metrics for this sequence, or None if not requested.

Identity IdentityMetrics | None

Identity metrics for this sequence, or None if not requested.

from_dict(data) classmethod

Create SequenceResult from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with sequence name and metrics.

required

Returns:

Type Description
SequenceResult

SequenceResult instance.

to_dict()

Convert to dictionary representation.

Returns:

Type Description
dict[str, Any]

Dictionary with all metric values.

json(indent=2)

Serialize to JSON string.

Parameters:

Name Type Description Default
indent int

Indentation level for formatting. Defaults to 2.

2

Returns:

Type Description
str

JSON string representation.

table(columns=None)

Format as a table string.

Parameters:

Name Type Description Default
columns list[str] | None

Metric columns to include. If None, includes all available metrics.

None

Returns:

Type Description
str

Formatted table string.

trackers.eval.results.BenchmarkResult dataclass

Result for multi-sequence evaluation.

Attributes:

Name Type Description
sequences dict[str, SequenceResult]

Dictionary mapping sequence names to their results.

aggregate SequenceResult

Combined metrics across all sequences.

from_dict(data) classmethod

Create BenchmarkResult from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with sequences and aggregate results.

required

Returns:

Type Description
BenchmarkResult

BenchmarkResult instance.

to_dict()

Convert to dictionary representation.

Returns:

Type Description
dict[str, Any]

Dictionary with all metric values.

json(indent=2)

Serialize to JSON string.

Parameters:

Name Type Description Default
indent int

Indentation level for formatting. Defaults to 2.

2

Returns:

Type Description
str

JSON string representation.

table(columns=None)

Format as a table string.

Parameters:

Name Type Description Default
columns list[str] | None

Metric columns to include. If None, includes all available metrics.

None

Returns:

Type Description
str

Formatted table string.

save(path)

Save to a JSON file.

Parameters:

Name Type Description Default
path str | Path

Destination file path.

required

load(path) classmethod

Load from a JSON file.

Parameters:

Name Type Description Default
path str | Path

Source file path.

required

Returns:

Type Description
BenchmarkResult

BenchmarkResult instance.

Raises:

Type Description
FileNotFoundError

If the file does not exist.

trackers.eval.results.CLEARMetrics dataclass

CLEAR metrics with TrackEval-compatible field names. Float metrics are stored as fractions (0-1 range), not percentages. The values follow the original CLEAR MOT definitions.

Attributes:

Name Type Description
MOTA float

Multiple Object Tracking Accuracy. Penalizes false negatives, false positives, and ID switches: (TP - FP - IDSW) / (TP + FN). Can be negative when errors exceed matches.

MOTP float

Multiple Object Tracking Precision. Mean IoU of matched pairs. Measures localization quality only.

MODA float

Multiple Object Detection Accuracy. Like MOTA but ignores ID switches: (TP - FP) / (TP + FN).

CLR_Re float

CLEAR recall. Fraction of GT detections matched: TP / (TP + FN).

CLR_Pr float

CLEAR precision. Fraction of tracker detections correct: TP / (TP + FP).

MTR float

Mostly tracked ratio. Fraction of GT tracks tracked for >80% of their lifespan.

PTR float

Partially tracked ratio. Fraction of GT tracks tracked for 20-80%.

MLR float

Mostly lost ratio. Fraction of GT tracks tracked for <20%.

sMOTA float

Summed MOTA. Replaces TP count with IoU sum: (MOTP_sum - FP - IDSW) / (TP + FN).

CLR_TP int

True positives. Number of correct matches.

CLR_FN int

False negatives. Number of missed GT detections.

CLR_FP int

False positives. Number of spurious tracker detections.

IDSW int

ID switches. Times a GT track changes its matched tracker ID.

MT int

Mostly tracked count. Number of GT tracks tracked >80%.

PT int

Partially tracked count. Number of GT tracks tracked 20-80%.

ML int

Mostly lost count. Number of GT tracks tracked <20%.

Frag int

Fragmentations. Times a tracked GT becomes untracked then tracked again.

MOTP_sum float

Raw IoU sum for aggregation across sequences.

CLR_Frames int

Number of frames evaluated.

from_dict(data) classmethod

Create CLEARMetrics from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with metric values.

required

Returns:

Type Description
CLEARMetrics

CLEARMetrics instance.

to_dict()

Convert to dictionary representation.

Returns:

Type Description
dict[str, Any]

Dictionary with all metric values.

trackers.eval.results.HOTAMetrics dataclass

HOTA metrics with TrackEval-compatible field names. HOTA evaluates both detection quality and association quality. Float metrics are stored as fractions (0-1 range).

Attributes:

Name Type Description
HOTA float

Higher Order Tracking Accuracy. Geometric mean of DetA and AssA, averaged over 19 IoU thresholds (0.05 to 0.95).

DetA float

Detection accuracy: TP / (TP + FN + FP).

AssA float

Association accuracy for matched detections over time.

DetRe float

Detection recall: TP / (TP + FN).

DetPr float

Detection precision: TP / (TP + FP).

AssRe float

Association recall. For each GT ID, measures how consistently it maps to a single tracker ID across time.

AssPr float

Association precision. For each tracker ID, measures how consistently it maps to a single GT ID across time.

LocA float

Localization accuracy. Mean IoU for matched pairs.

OWTA float

Open World Tracking Accuracy. sqrt(DetRe * AssA), useful when precision is less meaningful.

HOTA_TP int

True positive count summed over all 19 thresholds.

HOTA_FN int

False negative count summed over all 19 thresholds.

HOTA_FP int

False positive count summed over all 19 thresholds.

from_dict(data) classmethod

Create HOTAMetrics from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with metric values.

required

Returns:

Type Description
HOTAMetrics

HOTAMetrics instance.

to_dict(include_arrays=False, arrays_as_list=True)

Convert to dictionary representation.

Parameters:

Name Type Description Default
include_arrays bool

Whether to include per-alpha arrays. Defaults to False.

False
arrays_as_list bool

Whether to convert arrays to lists for JSON serialization. Defaults to True. Set to False to keep numpy arrays.

True

Returns:

Type Description
dict[str, Any]

Dictionary with all metric values.

trackers.eval.results.IdentityMetrics dataclass

Identity metrics with TrackEval-compatible field names. Identity metrics measure global ID consistency using an optimal one-to-one assignment between GT and tracker IDs across the full sequence.

Attributes:

Name Type Description
IDF1 float

ID F1 score. Harmonic mean of IDR and IDP, the primary identity metric.

IDR float

ID recall. IDTP / (IDTP + IDFN), fraction of GT detections with correct global ID assignment.

IDP float

ID precision. IDTP / (IDTP + IDFP), fraction of tracker detections with correct global ID assignment.

IDTP int

ID true positives. Detections matched with globally consistent IDs.

IDFN int

ID false negatives. GT detections not matched or matched to the wrong global ID.

IDFP int

ID false positives. Tracker detections not matched or matched to the wrong global ID.

from_dict(data) classmethod

Create IdentityMetrics from a dictionary.

Parameters:

Name Type Description Default
data dict[str, Any]

Dictionary with metric values.

required

Returns:

Type Description
IdentityMetrics

IdentityMetrics instance.

to_dict()

Convert to dictionary representation.

Returns:

Type Description
dict[str, Any]

Dictionary with all metric values.