modelzoo.transformers.pytorch.transformer_utils#
Functions
Create broadcastable attention mask (full or causal) so that masked positions are ignored.  | 
|
Creates a reverted autoregressive (upper triangular) mask where the 0s refers to the tokens  | 
|
Create autoregressive (triangular) mask.  | 
|
Create autoregressive (triangular) mask for variable sequence length.  | 
|
Makes broadcastable attention and causal masks so that future and masked tokens are ignored. :param attention_mask: Mask with ones indicating tokens to attend to, zeros for tokens to ignore. :type attention_mask:   | 
|
Makes broadcastable key_padding masks so that padding tokens are ignored.  | 
|
Create broadcastable sparse mask so that masked positions are ignored.  | 
|
Add label smoothing to loss function, this is a workaround method of label smoothing in our system  |