cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataset#
- class cerebras.modelzoo.data.nlp.bert.BertClassifierDataProcessor.ClassifierDataset[source]#
- Bases: - torch.utils.data.Dataset- Base class for dataset that load their raw data from TSV files. Child classes must provide read_tsv. - Parameters
- params (dict) – List of training input parameters for creating dataset. 
- is_training (bool) – Indicator for training or validation dataset. 
 
 - Methods - Tokenizes a single text (if text2 is None) or a pair of texts. - read_tsv- encode_sequence(text1, text2=None)[source]#
- Tokenizes a single text (if text2 is None) or a pair of texts. Truncates and adds special tokens as needed. - Parameters
- text1 (str) – First text to encode. 
- text2 (str) – Second text to encode or None. 
 
- Returns
- A list for input_ids, segment_ids and attention_mask. - input_ids (np.array[int.32]): Numpy array with input token indices. - Shape: (max_sequence_length). - segment_ids (np.array[int.32]): Numpy array with segment indices.
- Shape: (max_sequence_length). 
 
- attention_mask (np.array[int.32]): Numpy array with input masks.
- Shape: (max_sequence_length). 
 
 
 
 - __call__(*args: Any, **kwargs: Any) Any#
- Call self as a function. 
 - static __new__(cls, *args: Any, **kwargs: Any) Any#