cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.BertSumCSVDataProcessorConfig#
- class cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.BertSumCSVDataProcessorConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.config.data_config.DataConfigMethods
check_for_deprecated_fieldscheck_literal_discriminator_fieldcopyget_orig_classget_orig_class_argsget_vocab_filemodel_copymodel_post_initpost_initAttributes
The batch size.
Path to the data files to use.
discriminatordiscriminator_valueFlag to lower case the texts.
Whether to drop last batch of epoch if it's an incomplete batch.
Flag to whether mask the entire word.
max_cls_tokensmax_sequence_lengthmodel_configThe number of PyTorch processes used in the dataloader.
pad_idWhether or not to keep workers persistent between epochs.
The number of batches to prefetch in the dataloader.
Whether or not to shuffle the dataset.
Buffer size to shuffle samples across.
The seed used for deterministic shuffling.
Path to the vocabulary file.
data_processor- data_dir = Ellipsis#
Path to the data files to use.
- vocab_file = None#
Path to the vocabulary file.
- batch_size = Ellipsis#
The batch size.
- mask_whole_word = False#
Flag to whether mask the entire word.
- do_lower = False#
Flag to lower case the texts.
- shuffle = True#
Whether or not to shuffle the dataset.
- shuffle_seed = None#
The seed used for deterministic shuffling.
- shuffle_buffer = None#
Buffer size to shuffle samples across. If None and shuffle is enabled, 10*batch_size is used.
- num_workers = 0#
The number of PyTorch processes used in the dataloader.
- prefetch_factor = 10#
The number of batches to prefetch in the dataloader.
- persistent_workers = True#
Whether or not to keep workers persistent between epochs.
- drop_last = True#
Whether to drop last batch of epoch if it’s an incomplete batch.