cerebras.modelzoo.data_preparation.huggingface.HuggingFaceDataProcessor.HuggingFaceDataProcessorConfig#
- class cerebras.modelzoo.data_preparation.huggingface.HuggingFaceDataProcessor.HuggingFaceDataProcessorConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.config.data_config.DataConfig
Methods
check_for_deprecated_fields
check_literal_discriminator_field
copy
get_orig_class
get_orig_class_args
model_copy
model_post_init
post_init
Attributes
Batch size.
discriminator
discriminator_value
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
model_config
How many subprocesses to use for data loading.
If True, the data loader will not shutdown the worker processes after a dataset has been consumed once.
Number of batches loaded in advance by each worker.
Flag to enable data shuffling.
Size of shuffle buffer in samples.
Shuffle seed.
data_processor
- batch_size = Ellipsis#
Batch size.
- shuffle = False#
Flag to enable data shuffling.
- shuffle_seed = None#
Shuffle seed.
- shuffle_buffer = None#
Size of shuffle buffer in samples.
- num_workers = 0#
How many subprocesses to use for data loading.
- drop_last = True#
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
- prefetch_factor = 10#
Number of batches loaded in advance by each worker.
- persistent_workers = True#
If True, the data loader will not shutdown the worker processes after a dataset has been consumed once.