cerebras.modelzoo.common.utils.model.transformer_utils.create_chunked_attention_span#

cerebras.modelzoo.common.utils.model.transformer_utils.create_chunked_attention_span(batch_size, target_seq_len, chunk_size, device=None)[source]#

Create an attention span tensor to create a chunked attention mask pattern, similar to VSL masking. For a batch size of 1, sequence length of 10 and chunk size of 3, the attention span tensor is: ``` [

[2, 1, 0, 2, 1, 0, 2, 1, 0, 2],

Parameters
  • batch_size (int) – Input batch size.

  • target_seq_len (int) – Input sequence length.

  • chunk_size (int) – Size of local attention chunk window.

  • device (Optional[torch.device]) – The device of the input to the model.

Returns

Attention span tensor of shape [batch_size, target_seq_len].

Return type

torch.Tensor