cerebras.modelzoo.data.nlp.dpo.DpoHDF5DataProcessor.DpoHDF5DataProcessorConfig#
- class cerebras.modelzoo.data.nlp.dpo.DpoHDF5DataProcessor.DpoHDF5DataProcessorConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data.common.HDF5IterableDataProcessor.HDF5IterableDataProcessorConfig
Methods
check_for_deprecated_fields
check_literal_discriminator_field
copy
get_orig_class
get_orig_class_args
model_copy
model_post_init
post_init
Attributes
batch_size
Batch size.
data_dir
Path to dataset HDF5 files
discriminator
discriminator_value
drop_last
similar to the PyTorch drop_last setting except that samples that when set to True, samples that would have been dropped at the end of one epoch are yielded at the start of the next epoch so that there is no data loss.
List of features to include in the batch
model_config
num_workers
How many subprocesses to use for data loading.
persistent_workers
If True, the data loader will not shutdown the worker processes after a dataset has been consumed once.
prefetch_factor
Number of batches loaded in advance by each worker.
shuffle
Flag to enable data shuffling.
shuffle_buffer
Size of shuffle buffer in samples.
shuffle_seed
Shuffle seed.
use_vsl
Flag to enable variable sequence length training.
vocab_size
data_processor
- features_list = ['chosen_input_ids', 'chosen_attention_mask', 'chosen_labels', 'rejected_input_ids', 'rejected_attention_mask', 'rejected_labels']#
List of features to include in the batch