cerebras.modelzoo.data.common.h5_map_dataset.dataset#
Classes
Dynamically read samples from disk for using mapping paradigms. |
|
Specialized HDF5 dataset class to handle image preprocessing in multimodal datasets Functionality is largely the same as HDF5Dataset except with added image loading and preprocessing :param params: a dictionary containing the following added fields: - "img_data_dir" (str): the path to the directory containing the images. - "fp16_type" (str): the half dtype cast for the image - "image_data_size" (list[int]): the final C x H x W shape of the image - "transforms" (list[dict]): a specification of the torchvision transforms :type params: dict. |
|
The state we care about for allowing deterministic restart of instances of HDF5Dataset is the total number of samples streamed globally, which gets consumed by the sampler. |