cerebras.modelzoo.common.utils.model.transformer_utils.create_broadcasted_autoregressive_mask#
- cerebras.modelzoo.common.utils.model.transformer_utils.create_broadcasted_autoregressive_mask(batch_size, num_heads, tgt_seq_length, attention_span=None, attention_sliding_window_length=None, attention_sink_tokens=None, attention_vertical_column_spacing=None, attention_vertical_column_width=None, attention_chunk_size=None, device=None, dtype=torch.float16, use_neg_inf=True)[source]#
Create broadcasted causal attention mask optionally with VSL masking.
For VSL, attention_span is required and past tokens out of the current sequence are additionally masked.
- Parameters
batch_size (int) – Batch size.
num_heads (int) – Number of heads.
tgt_seq_length (int) – Target sequence length.
attention_span (torch.Tensor) – Attention span of keys for VSL, has shape [batch_size, target_seq_len].
attention_sliding_window_length (int) – If specified, the current token would only attend the current
tokens. (token and attention_sliding_window_length previous) –
attention_sink_tokens (int) – Number of attention sink tokens to be used for StreamingLLM-style inference.
attention_chunk_size (int) – If specified, the attention mask will have a chunked pattern of attention_chunk_size length windows.
device (torch.device) – The device of the input to the model, used for causal mask creation.
dtype (torch.dtype) – Dtype of the resulting mask, default to torch.float16.
use_neg_inf (bool) – Use negative infinity instead of one in the resulting mask, default to True.
- Returns
The attention mask of shape [batch_size, num_heads, tgt_seq_len, tgt_seq_len].