cerebras.modelzoo.common.utils.model.transformer_utils.create_chunked_attention_span#

cerebras.modelzoo.common.utils.model.transformer_utils.create_chunked_attention_span(batch_size, target_seq_len, chunk_size, device=None)[source]#

Create an attention span tensor to create a chunked attention mask pattern, similar to VSL masking. For a batch size of 1, sequence length of 10 and chunk size of 3, the attention span tensor is: ``` [

[2, 1, 0, 2, 1, 0, 2, 1, 0, 2],

Parameters

batch_size (int) – Input batch size.
target_seq_len (int) – Input sequence length.
chunk_size (int) – Size of local attention chunk window.
device (Optional[torch.device]) – The device of the input to the model.

Returns

Attention span tensor of shape [batch_size, target_seq_len].

Return type

torch.Tensor

cerebras.modelzoo.common.utils.model.transformer_utils.create_broadcasted_autoregressive_mask

cerebras.modelzoo.common.utils.model.transformer_utils.create_sliding_window_mask_with_complement