Evals API

`trackers.eval.evaluate.evaluate_mot_sequence(gt_path, tracker_path, metrics=None, threshold=0.5)`

Evaluate a single multi-object tracking result against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) for one sequence by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union).

TrackEval parity

This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.

Parameters:

Name	Type	Description	Default
`gt_path`	`str \| Path`	Path to the ground-truth MOT file.	required
`tracker_path`	`str \| Path`	Path to the tracker MOT file.	required
`metrics`	`list[str] \| None`	Metric families to compute. Supported values are `["CLEAR", "HOTA", "Identity"]`. Defaults to `["CLEAR"]`.	`None`
`threshold`	`float`	IoU threshold for `CLEAR` and `Identity` matching. Defaults to `0.5`. `HOTA` evaluates across multiple thresholds internally.	`0.5`

Returns:

Type	Description
`SequenceResult`	`SequenceResult` with `CLEAR`, `HOTA`, and/or `Identity` populated based on `metrics`.

Raises:

Type	Description
`FileNotFoundError`	If `gt_path` or `tracker_path` does not exist.
`ValueError`	If an unsupported metric family is requested.

Examples:

>>> from trackers.eval import evaluate_mot_sequence

>>> result = evaluate_mot_sequence(
...     gt_path="data/gt/MOT17-02/gt.txt",
...     tracker_path="data/trackers/MOT17-02.txt",
...     metrics=["CLEAR", "HOTA", "Identity"],
... )

>>> print(result.CLEAR.MOTA)
# 0.756

>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence                           MOTA    HOTA    IDF1  IDSW
# -------------------------------------------------------------
# MOT17-02                         75.600  62.300  72.100    42

`trackers.eval.evaluate.evaluate_mot_sequences(gt_dir, tracker_dir, seqmap=None, metrics=None, threshold=0.5, benchmark=None, split=None, tracker_name=None)`

Evaluate multiple multi-object tracking results against ground truth. Computes standard multi-object tracking metrics (CLEAR MOT, HOTA, Identity) across one or more sequences by matching predicted tracks to ground-truth tracks using per-frame IoU (Intersection over Union). Returns both per-sequence and aggregated (combined) results.

TrackEval parity

This evaluation code is intentionally designed to match the core matching logic and metric calculations of TrackEval.

Supported dataset layouts

MOT layoutFlat layout

gt_dir/
└── MOT17-train/
    ├── MOT17-02-FRCNN/
    │   └── gt/gt.txt
    ├── MOT17-04-FRCNN/
    │   └── gt/gt.txt
    ├── MOT17-05-FRCNN/
    │   └── gt/gt.txt
    └── ...

tracker_dir/
└── MOT17-train/
    └── ByteTrack/
        └── data/
            ├── MOT17-02-FRCNN.txt
            ├── MOT17-04-FRCNN.txt
            ├── MOT17-05-FRCNN.txt
            └── ...

gt_dir/
├── MOT17-02.txt
├── MOT17-04.txt
├── MOT17-05.txt
└── ...

tracker_dir/
├── MOT17-02.txt
├── MOT17-04.txt
├── MOT17-05.txt
└── ...

Parameters:

Name	Type	Description	Default
`gt_dir`	`str \| Path`	Directory with ground-truth files.	required
`tracker_dir`	`str \| Path`	Directory with tracker prediction files.	required
`seqmap`	`str \| Path \| None`	Optional sequence map. If provided, only those sequences are evaluated.	`None`
`metrics`	`list[str] \| None`	Metric families to compute. Supported values are `["CLEAR", "HOTA", "Identity"]`. Defaults to `["CLEAR"]`.	`None`
`threshold`	`float`	IoU threshold for `CLEAR` and `Identity`. Defaults to `0.5`.	`0.5`
`benchmark`	`str \| None`	Override auto-detected benchmark name (e.g., `"MOT17"`).	`None`
`split`	`str \| None`	Override auto-detected split name (e.g., `"train"`, `"val"`).	`None`
`tracker_name`	`str \| None`	Override auto-detected tracker name.	`None`

Returns:

Type	Description
`BenchmarkResult`	`BenchmarkResult` with per-sequence results and a `COMBINED` aggregate.

Raises:

Type	Description
`FileNotFoundError`	If `gt_dir` or `tracker_dir` does not exist.
`ValueError`	If auto-detection finds multiple valid options.

Examples:

Auto-detect layout and evaluate all sequences:

>>> from trackers.eval import evaluate_mot_sequences

>>> result = evaluate_mot_sequences(
...     gt_dir="data/gt/",
...     tracker_dir="data/trackers/",
...     metrics=["CLEAR", "HOTA", "Identity"],
... )

>>> print(result.table(columns=["MOTA", "HOTA", "IDF1", "IDSW"]))
# Sequence                           MOTA    HOTA    IDF1  IDSW
# -------------------------------------------------------------
# sequence1                        74.800  60.900  71.200    37
# sequence2                        76.100  63.200  72.500    45
# -------------------------------------------------------------
# COMBINED                         75.450  62.050  71.850    82

`trackers.eval.results.SequenceResult` `dataclass`

Result for a single sequence evaluation.

Attributes:

Name	Type	Description
`sequence`	`str`	Name of the sequence.
`CLEAR`	`CLEARMetrics \| None`	CLEAR metrics for this sequence, or `None` if not requested.
`HOTA`	`HOTAMetrics \| None`	HOTA metrics for this sequence, or `None` if not requested.
`Identity`	`IdentityMetrics \| None`	Identity metrics for this sequence, or `None` if not requested.

`from_dict(data)` `classmethod`

Create SequenceResult from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with sequence name and metrics.	required

Returns:

Type	Description
`SequenceResult`	`SequenceResult` instance.

`to_dict()`

Convert to dictionary representation.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with all metric values.

`json(indent=2)`

Serialize to JSON string.

Parameters:

Name	Type	Description	Default
`indent`	`int`	Indentation level for formatting. Defaults to `2`.	`2`

Returns:

Type	Description
`str`	JSON string representation.

`table(columns=None)`

Format as a table string.

Parameters:

Name	Type	Description	Default
`columns`	`list[str] \| None`	Metric columns to include. If `None`, includes all available metrics.	`None`

Returns:

Type	Description
`str`	Formatted table string.

`trackers.eval.results.BenchmarkResult` `dataclass`

Result for multi-sequence evaluation.

Attributes:

Name	Type	Description
`sequences`	`dict[str, SequenceResult]`	Dictionary mapping sequence names to their results.
`aggregate`	`SequenceResult`	Combined metrics across all sequences.

`from_dict(data)` `classmethod`

Create BenchmarkResult from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with sequences and aggregate results.	required

Returns:

Type	Description
`BenchmarkResult`	`BenchmarkResult` instance.

`to_dict()`

Convert to dictionary representation.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with all metric values.

`json(indent=2)`

Serialize to JSON string.

Parameters:

Name	Type	Description	Default
`indent`	`int`	Indentation level for formatting. Defaults to `2`.	`2`

Returns:

Type	Description
`str`	JSON string representation.

`table(columns=None)`

Format as a table string.

Parameters:

Name	Type	Description	Default
`columns`	`list[str] \| None`	Metric columns to include. If `None`, includes all available metrics.	`None`

Returns:

Type	Description
`str`	Formatted table string.

`save(path)`

Save to a JSON file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Destination file path.	required

`load(path)` `classmethod`

Load from a JSON file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Source file path.	required

Returns:

Type	Description
`BenchmarkResult`	`BenchmarkResult` instance.

Raises:

Type	Description
`FileNotFoundError`	If the file does not exist.

`trackers.eval.results.CLEARMetrics` `dataclass`

CLEAR metrics with TrackEval-compatible field names. Float metrics are stored as fractions (0-1 range), not percentages. The values follow the original CLEAR MOT definitions.

Attributes:

Name	Type	Description
`MOTA`	`float`	Multiple Object Tracking Accuracy. Penalizes false negatives, false positives, and ID switches: `(TP - FP - IDSW) / (TP + FN)`. Can be negative when errors exceed matches.
`MOTP`	`float`	Multiple Object Tracking Precision. Mean IoU of matched pairs. Measures localization quality only.
`MODA`	`float`	Multiple Object Detection Accuracy. Like MOTA but ignores ID switches: `(TP - FP) / (TP + FN)`.
`CLR_Re`	`float`	CLEAR recall. Fraction of GT detections matched: `TP / (TP + FN)`.
`CLR_Pr`	`float`	CLEAR precision. Fraction of tracker detections correct: `TP / (TP + FP)`.
`MTR`	`float`	Mostly tracked ratio. Fraction of GT tracks tracked for >80% of their lifespan.
`PTR`	`float`	Partially tracked ratio. Fraction of GT tracks tracked for 20-80%.
`MLR`	`float`	Mostly lost ratio. Fraction of GT tracks tracked for <20%.
`sMOTA`	`float`	Summed MOTA. Replaces TP count with IoU sum: `(MOTP_sum - FP - IDSW) / (TP + FN)`.
`CLR_TP`	`int`	True positives. Number of correct matches.
`CLR_FN`	`int`	False negatives. Number of missed GT detections.
`CLR_FP`	`int`	False positives. Number of spurious tracker detections.
`IDSW`	`int`	ID switches. Times a GT track changes its matched tracker ID.
`MT`	`int`	Mostly tracked count. Number of GT tracks tracked >80%.
`PT`	`int`	Partially tracked count. Number of GT tracks tracked 20-80%.
`ML`	`int`	Mostly lost count. Number of GT tracks tracked <20%.
`Frag`	`int`	Fragmentations. Times a tracked GT becomes untracked then tracked again.
`MOTP_sum`	`float`	Raw IoU sum for aggregation across sequences.
`CLR_Frames`	`int`	Number of frames evaluated.

`from_dict(data)` `classmethod`

Create CLEARMetrics from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with metric values.	required

Returns:

Type	Description
`CLEARMetrics`	`CLEARMetrics` instance.

`to_dict()`

Convert to dictionary representation.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with all metric values.

`trackers.eval.results.HOTAMetrics` `dataclass`

HOTA metrics with TrackEval-compatible field names. HOTA evaluates both detection quality and association quality. Float metrics are stored as fractions (0-1 range).

Attributes:

Name	Type	Description
`HOTA`	`float`	Higher Order Tracking Accuracy. Geometric mean of DetA and AssA, averaged over 19 IoU thresholds (0.05 to 0.95).
`DetA`	`float`	Detection accuracy: `TP / (TP + FN + FP)`.
`AssA`	`float`	Association accuracy for matched detections over time.
`DetRe`	`float`	Detection recall: `TP / (TP + FN)`.
`DetPr`	`float`	Detection precision: `TP / (TP + FP)`.
`AssRe`	`float`	Association recall. For each GT ID, measures how consistently it maps to a single tracker ID across time.
`AssPr`	`float`	Association precision. For each tracker ID, measures how consistently it maps to a single GT ID across time.
`LocA`	`float`	Localization accuracy. Mean IoU for matched pairs.
`OWTA`	`float`	Open World Tracking Accuracy. `sqrt(DetRe * AssA)`, useful when precision is less meaningful.
`HOTA_TP`	`int`	True positive count summed over all 19 thresholds.
`HOTA_FN`	`int`	False negative count summed over all 19 thresholds.
`HOTA_FP`	`int`	False positive count summed over all 19 thresholds.

`from_dict(data)` `classmethod`

Create HOTAMetrics from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with metric values.	required

Returns:

Type	Description
`HOTAMetrics`	`HOTAMetrics` instance.

`to_dict(include_arrays=False, arrays_as_list=True)`

Convert to dictionary representation.

Parameters:

Name	Type	Description	Default
`include_arrays`	`bool`	Whether to include per-alpha arrays. Defaults to `False`.	`False`
`arrays_as_list`	`bool`	Whether to convert arrays to lists for JSON serialization. Defaults to `True`. Set to `False` to keep numpy arrays.	`True`

Returns:

Type	Description
`dict[str, Any]`	Dictionary with all metric values.

`trackers.eval.results.IdentityMetrics` `dataclass`

Identity metrics with TrackEval-compatible field names. Identity metrics measure global ID consistency using an optimal one-to-one assignment between GT and tracker IDs across the full sequence.

Attributes:

Name	Type	Description
`IDF1`	`float`	ID F1 score. Harmonic mean of IDR and IDP, the primary identity metric.
`IDR`	`float`	ID recall. `IDTP / (IDTP + IDFN)`, fraction of GT detections with correct global ID assignment.
`IDP`	`float`	ID precision. `IDTP / (IDTP + IDFP)`, fraction of tracker detections with correct global ID assignment.
`IDTP`	`int`	ID true positives. Detections matched with globally consistent IDs.
`IDFN`	`int`	ID false negatives. GT detections not matched or matched to the wrong global ID.
`IDFP`	`int`	ID false positives. Tracker detections not matched or matched to the wrong global ID.

`from_dict(data)` `classmethod`

Create IdentityMetrics from a dictionary.

Parameters:

Name	Type	Description	Default
`data`	`dict[str, Any]`	Dictionary with metric values.	required

Returns:

Type	Description
`IdentityMetrics`	`IdentityMetrics` instance.

`to_dict()`

Convert to dictionary representation.

Returns:

Type	Description
`dict[str, Any]`	Dictionary with all metric values.

Evals API

trackers.eval.evaluate.evaluate_mot_sequence(gt_path, tracker_path, metrics=None, threshold=0.5)

trackers.eval.evaluate.evaluate_mot_sequences(gt_dir, tracker_dir, seqmap=None, metrics=None, threshold=0.5, benchmark=None, split=None, tracker_name=None)

trackers.eval.results.SequenceResult dataclass

from_dict(data) classmethod

to_dict()

json(indent=2)

table(columns=None)

trackers.eval.results.BenchmarkResult dataclass

from_dict(data) classmethod

to_dict()

json(indent=2)

table(columns=None)

save(path)

load(path) classmethod

trackers.eval.results.CLEARMetrics dataclass

from_dict(data) classmethod

to_dict()

trackers.eval.results.HOTAMetrics dataclass

from_dict(data) classmethod

to_dict(include_arrays=False, arrays_as_list=True)

trackers.eval.results.IdentityMetrics dataclass

from_dict(data) classmethod

to_dict()

`trackers.eval.evaluate.evaluate_mot_sequence(gt_path, tracker_path, metrics=None, threshold=0.5)`

`trackers.eval.evaluate.evaluate_mot_sequences(gt_dir, tracker_dir, seqmap=None, metrics=None, threshold=0.5, benchmark=None, split=None, tracker_name=None)`

`trackers.eval.results.SequenceResult` `dataclass`

`from_dict(data)` `classmethod`

`to_dict()`

`json(indent=2)`

`table(columns=None)`

`trackers.eval.results.BenchmarkResult` `dataclass`

`from_dict(data)` `classmethod`

`to_dict()`

`json(indent=2)`

`table(columns=None)`

`save(path)`

`load(path)` `classmethod`

`trackers.eval.results.CLEARMetrics` `dataclass`

`from_dict(data)` `classmethod`

`to_dict()`

`trackers.eval.results.HOTAMetrics` `dataclass`

`from_dict(data)` `classmethod`

`to_dict(include_arrays=False, arrays_as_list=True)`

`trackers.eval.results.IdentityMetrics` `dataclass`

`from_dict(data)` `classmethod`

`to_dict()`