cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataProcessorConfig#

class cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataProcessorConfig(*args, **kwargs)[source]#

Bases: cerebras.modelzoo.config.data_config.DataConfig

Methods

check_for_deprecated_fields

check_literal_discriminator_field

copy

get_orig_class

get_orig_class_args

get_vocab_file

model_copy

model_post_init

post_init

Attributes

attn_mask_pad_id

batch_size

The batch size.

data_dir

Path to the data files to use.

discriminator

discriminator_value

do_lower

Flag to lower case the texts.

drop_last

Whether to drop last batch of epoch if it's an incomplete batch.

input_pad_id

is_training

Whether the data processor is used for training or validation.

labels_pad_id

max_sequence_length

model_config

num_workers

The number of PyTorch processes used in the dataloader.

persistent_workers

Whether or not to keep workers persistent between epochs.

prefetch_factor

The number of batches to prefetch in the dataloader.

shuffle

Whether or not to shuffle the dataset.

shuffle_seed

The seed used for deterministic shuffling.

vocab_file

Path to the vocabulary file.

is_training = Ellipsis#

Whether the data processor is used for training or validation.

data_dir = Ellipsis#

Path to the data files to use.

batch_size = Ellipsis#

The batch size.

vocab_file = Ellipsis#

Path to the vocabulary file.

do_lower = False#

Flag to lower case the texts.

shuffle = True#

Whether or not to shuffle the dataset.

shuffle_seed = None#

The seed used for deterministic shuffling.

num_workers = 0#

The number of PyTorch processes used in the dataloader.

prefetch_factor = 10#

The number of batches to prefetch in the dataloader.

persistent_workers = True#

Whether or not to keep workers persistent between epochs.

drop_last = True#

Whether to drop last batch of epoch if it’s an incomplete batch.