cerebras.modelzoo.data.common.h5_map_dataset.dataset.MultimodalSimpleHDF5DatasetConfig#
- class cerebras.modelzoo.data.common.h5_map_dataset.dataset.MultimodalSimpleHDF5DatasetConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data.common.h5_map_dataset.dataset.MultiModalHDF5DatasetConfigMethods
check_for_deprecated_fieldscheck_mutual_exclusivitycopyget_orig_classget_orig_class_argsmodel_copymodel_post_initpost_initAttributes
batch_sizeThe batch size
data_dirThe path to the HDF5 files.
data_subsetAn optional specification to only consider a subset of the full dataset, useful for sequence length scheduling and multi-epoch testing.
drop_lastSimilar to the PyTorch drop_last setting except that samples that when set to True, samples that would have been dropped at the end of one epoch are yielded at the start of the next epoch so that there is no data loss.
image_data_sizeThe final C x H x W shape of the image.
img_data_dirThe path to the directory containing the images.
The maximum number of images.
max_sequence_lengthThe sequence length of samples produced by the dataloader.
mixtureAn optional specification of multiple datasets to mix over to create one single weighted combination.
model_configThe number of patches.
num_samplesThe number of samples to shuffle over (if shuffling is enabled).
pad_lastFlag to enable padding of the last batch so that the last batch has the same batch size as the rest of the batches.
shuffleWhether or not to shuffle the dataset.
shuffle_seedThe seed used for deterministic shuffling.
sort_filesWhether or not the reader should sort the input files.
transformsA specification of the torchvision transforms.
use_vslFlag to enable variable sequence length training.
use_worker_cacheWhether or not to copy data to storage that is directly attached to each individual worker node.
- __call__(**kwargs)#
Construct the original class with the current config.
By original class, we mean the class that this config class is associated with.
- max_num_img = 1#
The maximum number of images.
- num_patches = None#
The number of patches.