cerebras.modelzoo.data.nlp.bert.BertCSVDataProcessor.BertCSVDataProcessor#

class cerebras.modelzoo.data.nlp.bert.BertCSVDataProcessor.BertCSVDataProcessor(*args, **kwargs)[source]#

Reads csv files containing the input text tokens, and MLM features.

Methods

`create_dataloader`	Classmethod to create the dataloader object.
`get_single_item`	Iterating over the data to construct input features.
`load_buffer`	Generator to read the data in chunks of size of data_buffer.

load_buffer()[source]#

Generator to read the data in chunks of size of data_buffer.

get_single_item()[source]#

Iterating over the data to construct input features.

Returns

A tuple with training features: * np.array[int.32] input_ids: Numpy array with input token indices.

Shape: (max_sequence_length).

np.array[int.32] labels: Numpy array with labels.
Shape: (max_sequence_length).
np.array[int.32] attention_mask
Shape: (max_sequence_length).
np.array[int.32] token_type_ids: Numpy array with segment indices.
Shape: (max_sequence_length).
np.array[int.32] next_sentence_label: Numpy array with labels for NSP task.
Shape: (1).
np.array[int.32] masked_lm_mask: Numpy array with a mask of
predicted tokens. Shape: (max_predictions) 0 indicates the non masked token, and 1 indicates the masked token.

cerebras.modelzoo.data.nlp.bert.BertCSVDataProcessor

cerebras.modelzoo.data.nlp.bert.BertCSVDataProcessor.BertCSVDataProcessorConfig