Datasets API

`trackers.datasets.manifest.Dataset`

Bases: str, Enum

Supported benchmark tracking datasets.

Attributes:

Name	Type	Description
`MOT17`		Pedestrian tracking with crowded scenes and frequent occlusions. Strongly tests re-identification and identity stability.
`SPORTSMOT`		Sports broadcast tracking with fast motion, camera pans, and similar-looking targets. Tests association under speed and appearance ambiguity.

`trackers.datasets.manifest.DatasetSplit`

Bases: str, Enum

Available dataset splits.

Attributes:

Name	Type	Description
`TRAIN`		Training split.
`VAL`		Validation split.
`TEST`		Test split.

`trackers.datasets.manifest.DatasetAsset`

Bases: str, Enum

Downloadable asset types within a dataset split.

Attributes:

Name	Type	Description
`FRAMES`		Raw video frames as individual image files.
`ANNOTATIONS`		Ground-truth bounding box and identity labels.
`DETECTIONS`		Pre-computed object detection results.

`trackers.datasets.download.download_dataset(*, dataset, split=None, asset=None, output=_DEFAULT_OUTPUT_DIR, cache_dir=_DEFAULT_CACHE_DIR)`

Download benchmark tracking datasets from the official GCP bucket.

Downloads ZIP files into a persistent cache directory and extracts them into the output directory. Cached ZIPs are reused across runs so that re-extraction after deleting the output directory does not require re-downloading.

Parameters:

Name	Type	Description	Default
`dataset`	`str \| Dataset`	Dataset to download, as a `Dataset` enum or string name. Case-insensitive.	required
`split`	`DatasetSplit \| str \| list[DatasetSplit \| str] \| None`	Splits to download. If `None`, all available splits are downloaded.	`None`
`asset`	`DatasetAsset \| str \| list[DatasetAsset \| str] \| None`	Asset types to download. If `None`, all available assets for each split are downloaded.	`None`
`output`	`str`	Directory where dataset files will be extracted. Defaults to the current working directory.	`_DEFAULT_OUTPUT_DIR`
`cache_dir`	`str`	Directory for caching downloaded ZIP files. Cached ZIPs are verified by MD5 and reused when valid.	`_DEFAULT_CACHE_DIR`

Raises:

Type	Description
`ValueError`	If `dataset`, `split`, or `asset` contains an unrecognized value.

Examples:

Using enums for type-safe dataset, split, and asset selection:

>>> from trackers import Dataset, DatasetAsset, DatasetSplit, download_dataset
>>> download_dataset(
...     dataset=Dataset.MOT17,
...     split=[DatasetSplit.TRAIN, DatasetSplit.VAL],
...     asset=[DatasetAsset.ANNOTATIONS],
... )

Using plain strings for quick, interactive use:

>>> from trackers import download_dataset
>>> download_dataset(
...     dataset="mot17",
...     split=["train"],
...     asset=["frames", "annotations"],
...     output="./datasets",
... )