cerebras.modelzoo.common.utils.model.transformer_utils#

Functions

`create_2D_autoregressive_mask`	Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens
`create_2D_full_mask`	Create autoregressive (triangular) mask.
`create_broadcasted_autoregressive_mask`	Create broadcasted causal attention mask optionally with VSL masking.
`create_chunked_attention_span`	Create an attention span tensor to create a chunked attention mask pattern, similar to VSL masking. For a batch size of 1, sequence length of 10 and chunk size of 3, the attention span tensor is: ` [ [2, 1, 0, 2, 1, 0, 2, 1, 0, 2], ] `.
`create_sliding_window_mask_with_complement`	Returns two boolean masks, one is a sliding window causal mask, the second is a complement so that both form a lower-triangular causal mask. That is, the sliding window mask would look like: ` [ [True, False, False, False, False], [True, True, False, False, False], [False, True, True, False, False], [False, False, True, True, False], [False, False, False, True, True], ] ` whereas the complement mask is: ` [ [False, False, False, False, False], [False, False, False, False, False], [True, False, False, False, False], [True, True, False, False, False], [True, True, True, False, False], ] `.
`create_vsl_mask`	Creates a VSL attention mask.
`get_embedding_dtype`
`get_extended_attention_mask`	Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask: `torch.Tensor` :param input_shape: The shape of the input to the model (required for causal masks). :type input_shape: `Tuple[int]` :param causal: (bool): If enabled the returned mask will be causal. :param device: (`torch.device`): The device of the input to the model.
`make_key_padding_mask_broadcastable`	Makes broadcastable key_padding masks so that padding tokens are ignored.
`make_sparse_mask_broadcastable`	Create broadcastable sparse mask so that masked positions are ignored.
`replace_with_zero_and_neg_inf`	Replace the values in mask tensor with 0 and -inf.
`smooth_loss`	Add label smoothing to loss function, this is a workaround method of label smoothing in our system.

cerebras.modelzoo.common.utils.model.lora.LoraConfig

cerebras.modelzoo.common.utils.model.transformer_utils.create_2D_autoregressive_mask