cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDataset#

class cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDataset(*args, **kwargs)[source]#

Bases: torch.utils.data.IterableDataset

A HDF5 dataset processor. Loads data from HDF5 files.

Parameters

config (cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig) – The configuration object for the dataset

Methods

set_state

This method sets the state of the dataloader's samples_seen variable that controls how many samples are to be skipped for determinisitic restart.

Attributes

samples_seen

set_state(samples_seen, shard_index)[source]#

This method sets the state of the dataloader’s samples_seen variable that controls how many samples are to be skipped for determinisitic restart. This is called by the load_state_dict method of the RestartableDataLoader.

Parameters
  • samples_seen (int) – number of samples streamed by the dataloader

  • shard_index (int) – the index of the shard of data that this worker is responsible for streaming