cerebras.modelzoo.common.utils.model.transformer_utils#
Functions
| Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens | |
| Create autoregressive (triangular) mask. | |
| Create broadcasted causal attention mask optionally with VSL masking. | |
| Create an attention span tensor to create a chunked attention mask pattern, similar to VSL masking. For a batch size of 1, sequence length of 10 and chunk size of 3, the attention span tensor is:  | |
| Returns two boolean masks, one is a sliding window causal mask, the second is a complement so that both form a lower-triangular causal mask. That is, the sliding window mask would look like:  | |
| Creates a VSL attention mask. | |
| Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask:  | |
| Makes broadcastable key_padding masks so that padding tokens are ignored. | |
| Create broadcastable sparse mask so that masked positions are ignored. | |
| Replace the values in mask tensor with 0 and -inf. | |
| Add label smoothing to loss function, this is a workaround method of label smoothing in our system. |