cerebras.modelzoo.common.utils.model.transformer_utils.create_broadcasted_autoregressive_mask#
- cerebras.modelzoo.common.utils.model.transformer_utils.create_broadcasted_autoregressive_mask(batch_size: int, num_heads: int, tgt_seq_length: int, attention_span: Optional[torch.Tensor] = None, sliding_window_length: Optional[int] = None, device: Optional[torch.device] = None, dtype: torch.dtype = torch.float16, multiply_neg_inf: bool = True)[source]#
Create broadcasted causal attention mask optionally with vsl masking.
For vsl attention_span is required and past tokens out of the current sequence are additionally masked.
- Parameters
batch_size (int) – Batch size.
num_heads (int) – Number of heads.
tgt_seq_length (int) – Target sequence length.
attention_span (torch.Tensor) – attention span of keys for VSL has shape [batch_size, target_seq_len].
sliding_window_length (int) – If specified, the current token would only attend the current
tokens. (token and sliding_window_length previous) –
device (torch.device) – The device of the input to the model, used for causal mask creation.
dtype (torch.dtype) – Dtype of the resulting mask, default to torch.float16.
multiply_neg_inf (bool) – whether to multiply the resulting mask by a negative infinity constant, default to True.
- Returns
The attention mask of shape [batch_size, num_heads, tgt_seq_len, tgt_seq_len].