Skip to content

Datasets API

trackers.datasets.manifest.Dataset

Bases: str, Enum

Supported benchmark tracking datasets.

Attributes:

Name Type Description
MOT17

Pedestrian tracking with crowded scenes and frequent occlusions. Strongly tests re-identification and identity stability.

SPORTSMOT

Sports broadcast tracking with fast motion, camera pans, and similar-looking targets. Tests association under speed and appearance ambiguity.

trackers.datasets.manifest.DatasetSplit

Bases: str, Enum

Available dataset splits.

Attributes:

Name Type Description
TRAIN

Training split.

VAL

Validation split.

TEST

Test split.

trackers.datasets.manifest.DatasetAsset

Bases: str, Enum

Downloadable asset types within a dataset split.

Attributes:

Name Type Description
FRAMES

Raw video frames as individual image files.

ANNOTATIONS

Ground-truth bounding box and identity labels.

DETECTIONS

Pre-computed object detection results.

trackers.datasets.download.download_dataset(*, dataset, split=None, asset=None, output=_DEFAULT_OUTPUT_DIR, cache_dir=_DEFAULT_CACHE_DIR)

Download benchmark tracking datasets from the official GCP bucket.

Downloads ZIP files into a persistent cache directory and extracts them into the output directory. Cached ZIPs are reused across runs so that re-extraction after deleting the output directory does not require re-downloading.

Parameters:

Name Type Description Default
dataset str | Dataset

Dataset to download, as a Dataset enum or string name. Case-insensitive.

required
split DatasetSplit | str | list[DatasetSplit | str] | None

Splits to download. If None, all available splits are downloaded.

None
asset DatasetAsset | str | list[DatasetAsset | str] | None

Asset types to download. If None, all available assets for each split are downloaded.

None
output str

Directory where dataset files will be extracted. Defaults to the current working directory.

_DEFAULT_OUTPUT_DIR
cache_dir str

Directory for caching downloaded ZIP files. Cached ZIPs are verified by MD5 and reused when valid.

_DEFAULT_CACHE_DIR

Raises:

Type Description
ValueError

If dataset, split, or asset contains an unrecognized value.

Examples:

Using enums for type-safe dataset, split, and asset selection:

>>> from trackers import Dataset, DatasetAsset, DatasetSplit, download_dataset
>>> download_dataset(
...     dataset=Dataset.MOT17,
...     split=[DatasetSplit.TRAIN, DatasetSplit.VAL],
...     asset=[DatasetAsset.ANNOTATIONS],
... )

Using plain strings for quick, interactive use:

>>> from trackers import download_dataset
>>> download_dataset(
...     dataset="mot17",
...     split=["train"],
...     asset=["frames", "annotations"],
...     output="./datasets",
... )