cerebras.modelzoo.common.utils.model.transformer_utils#
Functions
Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens |
|
Create autoregressive (triangular) mask. |
|
Create broadcasted causal attention mask optionally with vsl masking. |
|
Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask: |
|
Makes broadcastable key_padding masks so that padding tokens are ignored. |
|
Create broadcastable sparse mask so that masked positions are ignored. |
|
Add label smoothing to loss function, this is a workaround method of label smoothing in our system |