cerebras.modelzoo.data.common.HDF5IterableDataProcessor.HDF5IterableDataProcessorConfig#
- class cerebras.modelzoo.data.common.HDF5IterableDataProcessor.HDF5IterableDataProcessorConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data.common.config.GenericDataProcessorConfig,cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfigMethods
check_for_deprecated_fieldscheck_literal_discriminator_fieldcopyget_orig_classget_orig_class_argsmodel_copymodel_post_initpost_initAttributes
batch_sizeBatch size.
data_dirPath to dataset HDF5 files
discriminatordiscriminator_valuedrop_lastsimilar to the PyTorch drop_last setting except that samples that when set to True, samples that would have been dropped at the end of one epoch are yielded at the start of the next epoch so that there is no data loss.
features_listList of features to include in the batch
model_confignum_workersHow many subprocesses to use for data loading.
persistent_workersIf True, the data loader will not shutdown the worker processes after a dataset has been consumed once.
prefetch_factorNumber of batches loaded in advance by each worker.
shuffleFlag to enable data shuffling.
shuffle_bufferSize of shuffle buffer in samples.
shuffle_seedShuffle seed.
use_vslFlag to enable variable sequence length training.
vocab_sizedata_processor