cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature#
- cerebras.modelzoo.data.nlp.bert.BertSumCSVDataProcessor.create_bertsum_feature(input_ids, segment_ids, cls_indices, labels, max_sequence_length, max_cls_tokens, pad_id)[source]#
- Creates the feature dict for bertsum model after applying padding. - Parameters
- input_ids (list) – Token ids to pad. 
- segment_ids (list) – Segment ids to pad. 
- cls_indices (list) – Class ids to pad. 
- labels (list) – Labels to pad. 
- max_sequence_length (int) – Maximum sequence length. 
- max_cls_tokens (int) – Max class tokens. 
- pad_id (int) – Padding id. 
- tokenize (callable) – Method to tokenize the input sequence. 
 
- Returns
- dict for feature which includes keys: * ‘input_tokens’: Numpy array with input token indices. - shape: (max_sequence_length), dtype: int32. - ’attention_mask’: Numpy array with attention mask.
- shape: (max_sequence_length), dtype: int32. 
 
- ’token_type_ids’: Numpy array with segment ids.
- shape: (max_sequence_length), dtype: int32. 
 
- ’labels’: Numpy array with labels.
- shape: (max_cls_tokens), dtype: int32. 
 
- ’cls_indices’: Numpy array with class indices.
- Shape: (max_cls_tokens). 
 
- ’cls_weights’: Numpy array with class weights.
- Shape: (max_cls_tokens).