common.pytorch.model_utils package#

Subpackages#

Submodules#

common.pytorch.model_utils.BertPretrainModelLoss module#

class common.pytorch.model_utils.BertPretrainModelLoss.BertPretrainModelLoss[source]#

Bases: torch.nn.Module

__init__(disable_nsp=False, mlm_loss_weight=1.0, label_smoothing=0.0)[source]#
forward(mlm_logits, vocab_size, mlm_labels, nsp_logits, nsp_labels, mlm_weights, mlm_loss_scale=None)[source]#

common.pytorch.model_utils.GPTLMHeadModelLoss module#

class common.pytorch.model_utils.GPTLMHeadModelLoss.GPTLMHeadModelLoss[source]#

Bases: torch.nn.Module

__init__(vocab_size, loss_scaling, loss_weight)[source]#
forward(lm_logits, labels, attention_mask)[source]#

common.pytorch.model_utils.RotaryPositionEmbeddingHelper module#

class common.pytorch.model_utils.RotaryPositionEmbeddingHelper.RotaryPositionEmbeddingHelper[source]#

Bases: object

__init__(max_position_embeddings, rotary_dim)[source]#
create_fixed_pos_emb(x, offset)[source]#
rotate_tensor(x, real_seq_length, offset=0)[source]#

common.pytorch.model_utils.T5ForConditionalGenerationLoss module#

class common.pytorch.model_utils.T5ForConditionalGenerationLoss.T5ForConditionalGenerationLoss[source]#

Bases: torch.nn.Module

__init__(lm_loss_weight, mlm_loss_scaling, label_smoothing=0.0)[source]#
forward(lm_logits, labels, decoder_attention_mask, loss_weight=None)[source]#
Per-token loss is averaged across the batch by
  1. Summing across all tokens in the batch

  2. Dividing by the batch size

  3. Multiplying by the provided loss weight (expected to be roughly

    equal to batch_size / num_tokens_in_batch)

The user has the option to specify this loss weight once and use the same weight for every batch (by setting self.global_loss_weight and not passing in loss_weight to the forward function) or use a different weight for every batch (by passing loss_weight to the forward function).

common.pytorch.model_utils.T5ForConditionalGenerationLoss.smooth_loss(prediction_scores, loss, label_smoothing, classes)[source]#

common.pytorch.model_utils.activations module#

common.pytorch.model_utils.activations.geglu(x1, x2)[source]#
common.pytorch.model_utils.activations.gelu_fast(x)[source]#
common.pytorch.model_utils.activations.gelu_new(x)[source]#

Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415

common.pytorch.model_utils.activations.get_activation(activation)[source]#
common.pytorch.model_utils.activations.glu_bivariate_base_fn(x1, x2, activation_fn)[source]#
common.pytorch.model_utils.activations.is_glu_activation(activation)[source]#
common.pytorch.model_utils.activations.liglu(x1, x2)[source]#
common.pytorch.model_utils.activations.linear_act(x)[source]#
common.pytorch.model_utils.activations.quick_gelu(x)[source]#
common.pytorch.model_utils.activations.reglu(x1, x2)[source]#
common.pytorch.model_utils.activations.squared_gelu(x)[source]#
common.pytorch.model_utils.activations.swiglu(x1, x2)[source]#

common.pytorch.model_utils.convert_checkpoint module#

class common.pytorch.model_utils.convert_checkpoint.CheckpointConverterCLI[source]#

Bases: object

__init__()[source]#
common.pytorch.model_utils.convert_checkpoint.convert_checkpoint(model, src_fmt, tgt_fmt, checkpoint, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
common.pytorch.model_utils.convert_checkpoint.convert_checkpoint_from_file(model, src_fmt, tgt_fmt, checkpoint_file, config_file, outputdir=None, export_h5_checkpoint=False, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
common.pytorch.model_utils.convert_checkpoint.convert_config(model, src_fmt, tgt_fmt, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
common.pytorch.model_utils.convert_checkpoint.convert_config_from_file(model, src_fmt, tgt_fmt, config_file, outputdir=None, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
common.pytorch.model_utils.convert_checkpoint.diff_checkpoints(checkpoint_left, checkpoint_right, tensor_comparison_op='equal')[source]#

Compare state dictionaries of two checkpoints (left and right). Returns True if the dicts are the same. Tensors can be compared via the “equal” or “allclose” operators. All other types are compared for strict equality.

common.pytorch.model_utils.convert_checkpoint.diff_checkpoints_from_file(file_left, file_right, tensor_comparison_op='equal')[source]#

Compare two checkpoints (left and right). Returns True if the dicts are the same.

common.pytorch.model_utils.create_initializer module#

common.pytorch.model_utils.create_initializer.create_initializer(spec)[source]#

Creates the specified initializer.

Parameters
  • spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.

  • seed (int) – random seed for the initializer or None to run unseeded.

Returns

initializer that can be passed to layers

common.pytorch.model_utils.weight_initializers module#

common.pytorch.model_utils.weight_initializers.lecun_normal_(tensor)[source]#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunNormal

Parameters

tensor (torch.Tensor) – an n-dimensional torch.Tensor

Examples

>>> w = torch.empty(3, 3)
>>> lecun_normal_(w)
common.pytorch.model_utils.weight_initializers.lecun_uniform_(tensor)[source]#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunUniform

Parameters

tensor (torch.Tensor) – an n-dimensional torch.Tensor

Examples

>>> w = torch.empty(3, 3)
>>> lecun_uniform_(w)
common.pytorch.model_utils.weight_initializers.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)[source]#

Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).

Parameters
  • tensor (torch.Tensor) – an n-dimensional torch.Tensor

  • mean (float) – the mean of the normal distribution. Defaults to 0.0

  • std (float) – the standard deviation of the normal distribution. Defaults to 1.0

  • a (float) – the minimum cutoff value. Defaults to -2.0

  • b (float) – the maximum cutoff value. Defaults to 2.0

Examples

>>> w = torch.empty(3, 3)
>>> trunc_normal_(w)
common.pytorch.model_utils.weight_initializers.variance_scaling_(tensor, scale=1.0, mode='fan_in', distribution='truncated_normal')[source]#

Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/VarianceScaling

Fills the input Tensor with values given scale, mode and distribution.

Parameters
  • tensor (torch.Tensor) – an n-dimensional torch.Tensor

  • scale (float) – scaling factor (positive float)

  • mode (str) – mode of weight initialization. Defaults to fan_in

  • distribution (str) – distributino to initialize tensors with. Defaults to truncated_normal

Examples

>>> w = torch.empty(3, 3)
>>> variance_scaling_(w)

Module contents#