cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDataset#
- class cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDataset(*args, **kwargs)[source]#
Bases:
torch.utils.data.IterableDataset
A HDF5 dataset processor. Loads data from HDF5 files.
- Parameters
config (cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig) – The configuration object for the dataset
Methods
This method sets the state of the dataloader's samples_seen variable that controls how many samples are to be skipped for determinisitic restart.
Attributes
samples_seen
- set_state(samples_seen, shard_index)[source]#
This method sets the state of the dataloader’s samples_seen variable that controls how many samples are to be skipped for determinisitic restart. This is called by the load_state_dict method of the RestartableDataLoader.
- Parameters
samples_seen (int) – number of samples streamed by the dataloader
shard_index (int) – the index of the shard of data that this worker is responsible for streaming