cerebras.modelzoo.data.nlp.gpt.HuggingFaceIterableDataProcessorEli5.HuggingFaceIterableDataProcessorEli5Config#

class cerebras.modelzoo.data.nlp.gpt.HuggingFaceIterableDataProcessorEli5.HuggingFaceIterableDataProcessorEli5Config(*args, **kwargs)[source]#

Bases: cerebras.modelzoo.data_preparation.huggingface.HuggingFaceDataProcessor.HuggingFaceDataProcessorConfig

Methods

check_for_deprecated_fields

check_literal_discriminator_field

copy

get_orig_class

get_orig_class_args

model_copy

model_post_init

post_init

Attributes

batch_size

Batch size.

discriminator

discriminator_value

drop_last

If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.

model_config

num_workers

How many subprocesses to use for data loading.

persistent_workers

If True, the data loader will not shutdown the worker processes after a dataset has been consumed once.

prefetch_factor

Number of batches loaded in advance by each worker.

shuffle

Flag to enable data shuffling.

shuffle_buffer

Size of shuffle buffer in samples.

shuffle_seed

Shuffle seed.

split

data_processor