common.pytorch.model_utils package#
Subpackages#
- common.pytorch.model_utils.checkpoint_converters package
- Submodules
- common.pytorch.model_utils.checkpoint_converters.base_converter module
- common.pytorch.model_utils.checkpoint_converters.bert module
- common.pytorch.model_utils.checkpoint_converters.bert_finetune module
- common.pytorch.model_utils.checkpoint_converters.bloom_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.gpt2_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.gpt_neox_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.gptj_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.llama module
- common.pytorch.model_utils.checkpoint_converters.opt_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.salesforce_codegen_hf_cs module
- common.pytorch.model_utils.checkpoint_converters.t5 module
- Module contents
Submodules#
common.pytorch.model_utils.BertPretrainModelLoss module#
common.pytorch.model_utils.GPTLMHeadModelLoss module#
common.pytorch.model_utils.RotaryPositionEmbeddingHelper module#
common.pytorch.model_utils.T5ForConditionalGenerationLoss module#
- class common.pytorch.model_utils.T5ForConditionalGenerationLoss.T5ForConditionalGenerationLoss[source]#
Bases:
torch.nn.Module
- forward(lm_logits, labels, decoder_attention_mask, loss_weight=None)[source]#
- Per-token loss is averaged across the batch by
Summing across all tokens in the batch
Dividing by the batch size
- Multiplying by the provided loss weight (expected to be roughly
equal to batch_size / num_tokens_in_batch)
The user has the option to specify this loss weight once and use the same weight for every batch (by setting self.global_loss_weight and not passing in loss_weight to the forward function) or use a different weight for every batch (by passing loss_weight to the forward function).
common.pytorch.model_utils.activations module#
- common.pytorch.model_utils.activations.gelu_new(x)[source]#
Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415
common.pytorch.model_utils.convert_checkpoint module#
- common.pytorch.model_utils.convert_checkpoint.convert_checkpoint(model, src_fmt, tgt_fmt, checkpoint, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
- common.pytorch.model_utils.convert_checkpoint.convert_checkpoint_from_file(model, src_fmt, tgt_fmt, checkpoint_file, config_file, outputdir=None, export_h5_checkpoint=False, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
- common.pytorch.model_utils.convert_checkpoint.convert_config(model, src_fmt, tgt_fmt, config, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
- common.pytorch.model_utils.convert_checkpoint.convert_config_from_file(model, src_fmt, tgt_fmt, config_file, outputdir=None, drop_unmatched_keys=False, no_progress_bar=True, debug=False)[source]#
- common.pytorch.model_utils.convert_checkpoint.diff_checkpoints(checkpoint_left, checkpoint_right, tensor_comparison_op='equal')[source]#
Compare state dictionaries of two checkpoints (left and right). Returns True if the dicts are the same. Tensors can be compared via the “equal” or “allclose” operators. All other types are compared for strict equality.
common.pytorch.model_utils.create_initializer module#
- common.pytorch.model_utils.create_initializer.create_initializer(spec)[source]#
Creates the specified initializer.
- Parameters
spec (dict/str) – either a string indicating the name of the initializer or a dict that includes the name + other params if relevant.
seed (int) – random seed for the initializer or None to run unseeded.
- Returns
initializer that can be passed to layers
common.pytorch.model_utils.weight_initializers module#
- common.pytorch.model_utils.weight_initializers.lecun_normal_(tensor)[source]#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunNormal
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
Examples
>>> w = torch.empty(3, 3) >>> lecun_normal_(w)
- common.pytorch.model_utils.weight_initializers.lecun_uniform_(tensor)[source]#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/LecunUniform
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
Examples
>>> w = torch.empty(3, 3) >>> lecun_uniform_(w)
- common.pytorch.model_utils.weight_initializers.trunc_normal_(tensor, mean=0.0, std=1.0, a=- 2.0, b=2.0)[source]#
Fills the input Tensor with values drawn from a truncated normal distribution. The values are effectively drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\) redrawn until they are within the bounds. The method used for generating the random values works best when \(a \leq \text{mean} \leq b\).
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
mean (float) – the mean of the normal distribution. Defaults to 0.0
std (float) – the standard deviation of the normal distribution. Defaults to 1.0
a (float) – the minimum cutoff value. Defaults to -2.0
b (float) – the maximum cutoff value. Defaults to 2.0
Examples
>>> w = torch.empty(3, 3) >>> trunc_normal_(w)
- common.pytorch.model_utils.weight_initializers.variance_scaling_(tensor, scale=1.0, mode='fan_in', distribution='truncated_normal')[source]#
Adapted from TensorFlow’s initializations https://www.tensorflow.org/api_docs/python/tf/keras/initializers/VarianceScaling
Fills the input Tensor with values given scale, mode and distribution.
- Parameters
tensor (torch.Tensor) – an n-dimensional torch.Tensor
scale (float) – scaling factor (positive float)
mode (str) – mode of weight initialization. Defaults to fan_in
distribution (str) – distributino to initialize tensors with. Defaults to truncated_normal
Examples
>>> w = torch.empty(3, 3) >>> variance_scaling_(w)