cerebras.modelzoo.common.utils.model.transformer_utils.make_sparse_mask_broadcastable#
- cerebras.modelzoo.common.utils.model.transformer_utils.make_sparse_mask_broadcastable(sparse_mask, key_padding_mask, dtype=None, device=None, revert_mask=True, use_neg_inf=True)[source]#
- Create broadcastable sparse mask so that masked positions are ignored. - Parameters
- sparse_mask (torch.Tensor) – Sparse mask with shape [src_seq_len, target_seq_len]. 
- key_padding_mask (torch.Tensor) – Key padding mask with shape in [2,3,4]. 
- dtype (torch.dtype) – Dtype of the resulting mask. 
- device – (torch.device): The device to move the sparse mask to. 
- revert_mask (bool) – Whether to flip the 1’s and 0’s of the attention mask, default to True. 
- use_neg_inf (bool) – Use negative infinity instead of one in the resulting mask, default to True. 
 
- Returns
- The attention mask of shape [batch_size, num_heads, src_seq_len, target_seq_len], with broadcast dimensions set to 1.