cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.BertSumCSVDataProcessorConfig#

class cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.BertSumCSVDataProcessorConfig(*args, **kwargs)[source]#

Bases: cerebras.modelzoo.config.data_config.DataConfig

Methods

check_for_deprecated_fields

check_literal_discriminator_field

copy

get_orig_class

get_orig_class_args

get_vocab_file

model_copy

model_post_init

post_init

Attributes

batch_size

The batch size.

data_dir

Path to the data files to use.

discriminator

discriminator_value

do_lower

Flag to lower case the texts.

drop_last

Whether to drop last batch of epoch if it's an incomplete batch.

mask_whole_word

Flag to whether mask the entire word.

max_cls_tokens

max_sequence_length

model_config

num_workers

The number of PyTorch processes used in the dataloader.

pad_id

persistent_workers

Whether or not to keep workers persistent between epochs.

prefetch_factor

The number of batches to prefetch in the dataloader.

shuffle

Whether or not to shuffle the dataset.

shuffle_buffer

Buffer size to shuffle samples across.

shuffle_seed

The seed used for deterministic shuffling.

vocab_file

Path to the vocabulary file.

data_processor

data_dir = Ellipsis#

Path to the data files to use.

vocab_file = None#

Path to the vocabulary file.

batch_size = Ellipsis#

The batch size.

mask_whole_word = False#

Flag to whether mask the entire word.

do_lower = False#

Flag to lower case the texts.

shuffle = True#

Whether or not to shuffle the dataset.

shuffle_seed = None#

The seed used for deterministic shuffling.

shuffle_buffer = None#

Buffer size to shuffle samples across. If None and shuffle is enabled, 10*batch_size is used.

num_workers = 0#

The number of PyTorch processes used in the dataloader.

prefetch_factor = 10#

The number of batches to prefetch in the dataloader.

persistent_workers = True#

Whether or not to keep workers persistent between epochs.

drop_last = True#

Whether to drop last batch of epoch if it’s an incomplete batch.