cerebras.modelzoo.common.utils.model.transformer_utils.make_key_padding_mask_broadcastable#

cerebras.modelzoo.common.utils.model.transformer_utils.make_key_padding_mask_broadcastable(key_padding_mask: torch.Tensor, dtype=None, revert_mask: bool = True, multiply_neg_inf: bool = True)[source]#

Makes broadcastable key_padding masks so that padding tokens are ignored.

Parameters

key_padding_mask (torch.Tensor) – key padding mask with shape in [2,3,4], with entry values either 1 or 0.
dtype (torch.dtype) – Dtype of the resulting mask.
revert_mask (bool) – whether to flip the 1’s and 0’s of the attention mask, default to True.
multiply_neg_inf (bool) – whether to multiply the resulting mask by a negative infinity constant, default to True.

Returns

The key padding mask of shape [batch_size, num_heads, src_seq_len, target_seq_len], with broadcast dimensions set to 1.

cerebras.modelzoo.common.utils.model.transformer_utils.get_extended_attention_mask

cerebras.modelzoo.common.utils.model.transformer_utils.make_sparse_mask_broadcastable