common.tf.layers package#
Submodules#
common.tf.layers.AbstractRecomputeWrapper module#
- class common.tf.layers.AbstractRecomputeWrapper.AbstractRecomputeWrapper#
Bases:
abc.ABCUtility functions for the decorator tf.custom_gradient, when used in training.
An abstract class to handle many small requirements when using the decorator tf.custom_gradient. This class is used to recompute the activations during the backward propagation part of a training step. This code acts as a backbone for recompute wrappers and reversible layers.
The following utility functions are designed to make it easy to implement the recomputation:
_set_recomputed_tensorand_check_get_recomputed_tensor.These functions to attach the recomputed tensors to the corresponding forward pass tensors. These functions are useful for passing the recomputed tensors between, for example, reversible layers, so that we do not need to save any tensors.
_block_recompute_and_gradients.This function takes a forward block of the computation, recomputes the block, and then calculates and returns the gradients associated with the block.
Scope handling functions
tf.custom_gradient.This structure names the scopes of the gradients. However, this naming is based on the
IdentityNops it attaches to the portion of the graph for which the user would like to add a custom gradient. This is not always convenient. Moreover, thetf.custom_gradientdoes not track the appropriate control flow contexts for the variables used in that portion of the graph. The scope handling functions in this class are helpful here._get_clean_grad_scopeThis function cleans the named scope for clean graphs.
_update_variables_for_contextThis function finds the correct variable tensors for the control flow contexts: for example, to use recomputation inside a while-loop).
The basic structure for a recompute layer is as follows:
Define a custom gradient function using
tf.custom_gradientinside the__call__function of a recompute layer.Inside the
__call__function, call the forward propagation of the layer and define the recompute+gradient function. We recommend you use the_block_recompute_and_gradientsfunction).
- CtrlFlowWarnedOnce = False#
- abstract call(*args, **kwargs)#
The call function for the layers that use recomputation during backward phase.
This function is wrapped by the
__call__function of this abstract recompute wrapper, and it must be overridden by a child class to implement the forward computation of the layer.
- static is_in_while_loop(graph=None)#
Returns
Trueif the specified, or current if unspecified, graph corresponds to awhileloop in the forward, backward or cond graph.- Returns
Trueif the specified, or current if unspecified, graph corresponds to awhileloop in the forward, backward or cond graph.- Return type
bool
common.tf.layers.ActivationLayer module#
- class common.tf.layers.ActivationLayer.ActivationLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras activation layer.
Also supports
activation="GeLU",activation="relu6"andactivation="hard_swish"which are currently missing inkeras.layers.ActivationLayerv2.2.- Parameters
activation (Union[str, Callable]) – The function to be applied. This can either be callable string name of a Tensorflow built-in activation, or one of
"gelu","lrelu"(lreludenotes LeakyReLU),"relu6"or"hard_swish".boundary_casting (bool) – If
True, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, saves the activations with thesummary_layer.
- __init__(activation, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the activation layer.
- Parameters
inputs – Arbitrary tensor.
- Returns
A tensor of the same shape as the input.
- Return type
Tensor
- static gelu(x)#
- static hard_swish(x)#
- static relu6(x)#
common.tf.layers.AddLayer module#
- class common.tf.layers.AddLayer.AddLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras layer. Adds a list of inputs.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the
AddLayerto sum up a list of inputs.- Parameters
inputs – List of input tensors (at least 2).
- Returns
A tensor containing the sum of inputs.
- Return type
Tensor
common.tf.layers.AttentionLayer module#
- class common.tf.layers.AttentionLayer.AttentionLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerMulti-head attention layer. Based on MLCommons model.
- Parameters
hidden_size (int) – Number of units in each projection output.
num_heads (int) – Number of attention heads.
use_projection_bias (bool) – Whether to use bias in the key, query, and value projections.
use_ffn_bias (bool) – Whether to use bias in the output projection.
initializer (str) – Projection kernel intializer. Defaults to
glorot_uniform.query_layer_initializer (initializer) – Query kernel initializer. Defaults to None in which case
initializerwill be used.key_layer_initializer (initializer) – Key kernel initializer. Defaults to None in which case ``initializer` will be used.
value_layer_initializer (initializer) – Value kernel initializer. Defaults to None in which case
initializerwill be used.relative_attention_bias_weight_initializer (initializer) – Relative Attention Bias weight None in which case
initializerwill be used.output_layer_initializer (str or initializer) – If not None, use this initializer for the output transform layer. Defaults to None.
kernel_regularizer (Optional[Callable]) – Projection kernel regularizer. Defaults to
None.bias_regularizer (Optional[Callable]) – Projection bias regularizer. Defaults to
None.attention_type (str) – The attention variant to execute. Currently accepts
dot_productandscaled_dot_product. Defaults toscaled_dot_product.dropout_rate (float) – Dropout rate for key-query weights. Defaults to 0.0.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to
None.use_relative_attention_bias (bool) – Whether to use relative position bias when calculating attention.
relative_attention_bias (Tensor) – Tensor with relative attention weights. Shape: [num_relative_attention_buckets, num_heads]. Defaults set to None.
num_relative_attention_buckets (int) – Used to calculate relative position bias when use_relative_attention_bias set to True.
bidirectional_relative_attention (bool) – Whether attention is bidirectional.
softmax_dtype_fp32 (bool) – If
True, cast query-key logits to FP32 before sending into softmax calculation in FP32.boundary_casting (bool) – If
True, then outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, then saves the activations withsummary_layer.
- __init__(hidden_size, num_heads, output_projection_size=None, use_projection_bias=False, use_ffn_bias=False, initializer='glorot_uniform', query_layer_initializer=None, key_layer_initializer=None, value_layer_initializer=None, relative_attention_bias_weight_initializer=None, output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, attention_type='scaled_dot_product', dropout_rate=0.0, dropout_seed=None, use_relative_attention_bias=False, relative_attention_bias=None, num_relative_attention_buckets=32, bidirectional_relative_attention=False, softmax_dtype_fp32=True, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(q, v, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#
Applies the attention mechanism to queries
qand valuesv. Keys will be set to be same asv.- Parameters
q (Tensor) – Queries, shape
[batch_size, seq_length, hidden_size].v (Tensor) – Values, shape
[batch_size, seq_length, hidden_size].mask (Tensor) – Attention mask. Can be 2D of shape
[batch_size, seq_length], or 3D of shape[batch, query_length, seq_length].past_kv (Tensor) – Past keys and values. Has shape
[2, batch_size, num_heads, seq_length, hidden_size / num_heads]. The tensors in[0,:,:,:,:]and[1,:,:,:,:]contain the past keys and values, respectively. Defaults toNone.cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False.training (bool) – Training the model if
True. Needed to call thedropout(after softmax) in the appropriate mode.position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False.
- Returns
when
cache_present_kvisTrueandcache_position_biasisTrue, returns a tuple, where the 0th entry contains the attention output, 1st entry contains a tensor of keys and values computed at the current application of the attention layer, and the 3rd entry contains a tensor of position bias computed at the current application of the attention layer.If
cache_present_kvisFalse, no entry for present keys and values is provided.If
cache_position_biasisFalse, no entry for position bias is provided.if both
cache_present_kvcache_position_biasare set to False, return a tensor of shape equal to shape ofpast_kv(see above).
- class common.tf.layers.AttentionLayer.SelfAttentionLayer#
Bases:
common.tf.layers.AttentionLayer.AttentionLayerMultiheaded self-attention layer.
- call(x, mask=None, past_kv=None, cache_present_kv=False, training=True, position_bias=None, cache_position_bias=False)#
Applies the attention mechanism to queries
qand valuesv. Keys will be set to be same asv.- Parameters
q (Tensor) – Queries, shape
[batch_size, seq_length, hidden_size].v (Tensor) – Values, shape
[batch_size, seq_length, hidden_size].mask (Tensor) – Attention mask. Can be 2D of shape
[batch_size, seq_length], or 3D of shape[batch, query_length, seq_length].past_kv (Tensor) – Past keys and values. Has shape
[2, batch_size, num_heads, seq_length, hidden_size / num_heads]. The tensors in[0,:,:,:,:]and[1,:,:,:,:]contain the past keys and values, respectively. Defaults toNone.cache_present_kv (bool) – Specifies if the present keys and values must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False.training (bool) – Training the model if
True. Needed to call thedropout(after softmax) in the appropriate mode.position_bias (Tensor) – Tensor containing position bias to apply in attention.
cache_position_bias (bool) – Specifies if position bias must be cached and returned. Needed to speed up the computations when the decoder is called within an autoregressive loop. Defaults to
False.
- Returns
when
cache_present_kvisTrueandcache_position_biasisTrue, returns a tuple, where the 0th entry contains the attention output, 1st entry contains a tensor of keys and values computed at the current application of the attention layer, and the 3rd entry contains a tensor of position bias computed at the current application of the attention layer.If
cache_present_kvisFalse, no entry for present keys and values is provided.If
cache_position_biasisFalse, no entry for position bias is provided.if both
cache_present_kvcache_position_biasare set to False, return a tensor of shape equal to shape ofpast_kv(see above).
common.tf.layers.BaseLayer module#
- class common.tf.layers.BaseLayer.BaseLayer#
Bases:
tensorflow.keras.layers.LayerBase layer for the reference models.
- Parameters
boundary_casting (bool) – If
True, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, saves the activations withsummary_layer.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call()#
common.tf.layers.Conv2DLayer module#
- class common.tf.layers.Conv2DLayer.Conv2DLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras 2D convolution layer.
- __init__(filters, kernel_size, strides=(1, 1), padding='valid', data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the 2D convolution layer.
- Parameters
inputs – A 4D tensor with shape:
(samples, channels, rows, cols)ifdata_format='channels_first'or a 4D tensor with shape(samples, rows, cols, channels)ifdata_format='channels_last'.- Returns
A 4D tensor with shape:
(samples, filters, new_rows, new_cols)ifdata_format='channels_first'or a 4D tensor with shape:(samples, new_rows, new_cols, filters)ifdata_format='channels_last'. Note thatrowsandcolsvalues might have changed due to padding.- Return type
Tensor
common.tf.layers.Conv2DTransposeLayer module#
- class common.tf.layers.Conv2DTransposeLayer.Conv2DTransposeLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras 2D transposed convolution layer.
- __init__(filters, kernel_size, strides=(1, 1), padding='valid', output_padding=None, data_format=None, dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the 2D transposed convolution layer.
- Parameters
inputs – A 4D tensor with shape:
(samples, channels, rows, cols)ifdata_format='channels_first'or 4D tensor with shape:(samples, rows, cols, channels)ifdata_format='channels_last'.- Returns
A 4D tensor with shape:
(samples, filters, new_rows, new_cols)ifdata_format='channels_first'or a 4D tensor with shape:(samples, new_rows, new_cols, filters)ifdata_format='channels_last'. Note thatrowsandcolsvalues might have changed due to padding.- Return type
Tensor
common.tf.layers.CrossEntropyFromLogitsLayer module#
- class common.tf.layers.CrossEntropyFromLogitsLayer.CrossEntropyFromLogitsLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerCross entropy loss, given logits. Compares logits against labels.
- Parameters
boundary_casting (bool) –
tf_summary (bool) –
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(labels, logits)#
Calculating cross entropy over logits.
- Parameters
labels (Tensor) – Label indices.
logits (Tensor) – Logits (non-normalized).
- Returns
A tensor of the same shape as labels and of the same type as logits with the softmax cross entropy loss.
- Return type
Tensor
common.tf.layers.DenseLayer module#
- class common.tf.layers.DenseLayer.DenseLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras densely-connected layer. Provides support for
"gelu"activation.- Parameters
units (int) – Number of units in the layer output.
activation (Optional[Union[str, Callable]]) – If not
None, an activation function to be applied after the dense layer. The activation function can either be a callable string name of a Tensorflow built-in activation, or"gelu".use_bias (bool) – Whether to use bias.
kernel_initializer (str) – Kernel intializer. Defaults to
"glorot_uniform".kernel_initializer – Bias intializer. Defaults to
"zeros".kernel_regularizer (Optional[Callable]) – Kernel regularizer. Defaults to
None.bias_regularizer (Optional[Callable]) – Bias regularizer. Defaults to
None.activity_regularizer (Optional[Callable]) – Activity (output activation) regularizer. Defaults to
None.kernel_constraint (Optional[Callable]) – Kernel constraint. Defaults to
None.bias_constraint (Optional[Callable]) – Bias constraint. Defaults to
None.boundary_casting (bool) – If
True, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, saves the activations withsummary_layer.
- __init__(units, activation=None, use_bias=True, kernel_initializer='glorot_uniform', bias_initializer='zeros', kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the densely-connected layer.
- Parameters
inputs (Tensor) – An N-D tensor with shape:
(batch_size, ..., input_dim).- Returns
An N-D tensor with shape:
(batch_size, ..., units).- Return type
Tensor
common.tf.layers.DropoutLayer module#
- class common.tf.layers.DropoutLayer.DropoutLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras dropout layer.
- __init__(rate, noise_shape=None, seed=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, training=True, **kwargs)#
Performs the dropout.
- Parameters
inputs (Tensor) – Arbitrary tensor.
training (bool) – Training mode if set to
True.
- Returns
A tensor of same shape as input.
- Return type
Tensor
common.tf.layers.EmbeddingLayer module#
- class common.tf.layers.EmbeddingLayer.EmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerEmbedding layer. Built on top of the Keras Embedding layer.
- __init__(input_dim, output_dim, embeddings_initializer='uniform', bias_initializer='zeros', embeddings_regularizer=None, activity_regularizer=None, embeddings_constraint=None, mask_zero=False, input_length=None, use_bias=False, weight_name='embedding_weights', boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, pad_id=- 1, scale=1)#
Get token embeddings of inputs.
- Parameters
inputs (Tensor) – A tensor with shape
[batch_size, length].pad_id – Integer specifying which input ID corresponds instead to padding. It does not need to be a legal vocabulary entry. Any
`inputs`elements equal to this value will not be looked up, but instead directly output zeros. On the Wafer Scale Engine, this indicates the presence of variable sequence length.scale – Scaling of the embedding (in MLPERF
hidden_size**0.5is used).
- Returns
A tensor of embeddings with shape
[batch_size, length, hidden_size]. Padded positions are filled with zeros.- Return type
embeddings (Tensor)
- embedding_table()#
common.tf.layers.FeedForwardNetwork module#
- class common.tf.layers.FeedForwardNetwork.FeedForwardNetwork#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerA feed forward network that consists of a stack of fully connected layers.
- Parameters
layers_units (int) – List of units for each layer.
layers_activation (str) – List of activation types (str) for each layer.
layers_dropout_rates (float) – List of dropout rates (float) for each layer.
use_bias (bool) – If
True, use bias throughout all layers.kernel_initializer (string) – Kernel initializer. Defaults to
"glorot_uniform".bias_initializer (callable) – Bias initializer. Defaults to
"zeros".output_layer_initializer – If not None, initialize the last projection layer with this initializer. Defaults to None.
kernel_regularizer (callable) – Kernel regularizer.
bias_initializer – Bias regularizer.
dropout_seed (int) – Seed with which to initialize the dropout layer. Defaults to
None.
Initialize the FFN object instance.
- __init__(layers_units, layers_activation=None, layers_dropout_rates=None, use_bias=False, kernel_initializer='glorot_uniform', bias_initializer='zeros', output_layer_initializer=None, kernel_regularizer=None, bias_regularizer=None, dropout_seed=None, boundary_casting=False, tf_summary=False, **kwargs)#
Initialize the FFN object instance.
- call(inputs, training=True, **kwargs)#
common.tf.layers.FeedForwardNetworkV2 module#
- class common.tf.layers.FeedForwardNetworkV2.FeedForwardNetworkV2#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerImplement a feed forward network as used in the T5 model.
Setup the FFN components
- Parameters
d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.
- __init__(d_ff, d_model, activation='relu', dropout_rate=0.0, use_bias=False, input_layer_initializer='glorot_uniform', output_layer_initializer='glorot_uniform', dropout_seed=None, **kwargs)#
Setup the FFN components
- Parameters
d_ff (int) – The hidden dimension of the feed forward network, i.e. the output dimension of the first layer.
d_model (int) – The output dimension of the feed forward network.
activation (string) – The name of the activation to apply after the first dense layer.
dropout_rate (float) – Dropout rate applied after the first dense layer.
use_bias (bool) – Whether or not to use bias in the dense layers of the feed forward network.
input_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the first dense layer.
output_layer_initializer (initializer) – A string or initializer to use to initialize the weights of the second dense layer.
dropout_seed (int) – The seed to make the dropout layer deterministic.
**kwargs –
Keyword arguments to be passed into BaseLayer.
- call(inputs, training=True, **kwargs)#
common.tf.layers.Input module#
- common.tf.layers.Input.SetupInputTensor(features, tf_summary=False)#
Adds tensor summary to the model’s input features and their gradient, if
tf_summaryis set toTrue.- Parameters
features – The input features.
common.tf.layers.LayerNormalizationLayer module#
- class common.tf.layers.LayerNormalizationLayer.LayerNormalizationLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras layer normalization. Reference: Layer Normalization.
- __init__(axis=- 1, epsilon=1e-08, center=True, scale=True, beta_initializer='zeros', gamma_initializer='ones', beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, trainable=True, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Apply the layer normalization.
- Parameters
inputs (Tensor) – Arbitrary tensor.
- Returns
A normalized tensor of the same shape as input.
NOTE: While
**kwargsare passed, the training arg is never used.- Return type
Tensor
common.tf.layers.MaxPool2DLayer module#
- class common.tf.layers.MaxPool2DLayer.MaxPool2DLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras 2D max pooling layer.
- __init__(pool_size=(2, 2), strides=None, padding='valid', data_format=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Applies the 2D max pooling layer.
- Parameters
inputs (Tensor) – A 4D tensor with the shape:
(samples, channels, rows, cols)ifdata_format='channels_first'or a 4D tensor with the shape(samples, rows, cols, channels)ifdata_format='channels_last'.- Returns
A 4D tensor with the shape:
(batch_size, channels, pooled_rows, pooled_cols)ifdata_format='channels_first'or a 4D tensor with shape:(batch_size, pooled_rows, pooled_cols, channels)ifdata_format='channels_last'.- Return type
Tensor
common.tf.layers.PoolerLayer module#
- class common.tf.layers.PoolerLayer.PoolerLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerThe pooler layer.
Currently supports the following pooler types:
"mean": Mean reduction."max": Max reduction."first": First slice in the axis dimension."last": Last slice in the axis dimension."sum": Takes the sum over the axis dimension. Defaults to the entire Tensor.None: No pooling (output=input).
- __init__(pooler_type='mean', axis=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, padding_mask=None, **kwargs)#
Apply pooler of a given type.
Takes in a padding mask with 1s for tokens and 0s for padding.
common.tf.layers.PoolerLayerV2 module#
- class common.tf.layers.PoolerLayerV2.PoolerLayerV2#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerThe pooler layer. Usually used for pooling or summarizing the sequence data.
This layer is added as a workaround to the existing pooler layer for additional masking support. The plan is to use this layer for kernel matching and integ bring up. After we have full support for this layer, we should deprecate the old
PoolerLayer.- Parameters
pooler_type (str) – Type of pooling. Currently supports the following
types (pooler) –
"mean": Mean reduction."max": Max reduction."first": First slice in the axis dimension."last": Last slice in the axis dimension (Not yet supported)"sum": Takes the sum over the axis dimension. Defaults to the entire Tensor.
axis (int) – The dimensions to reduce. If None (the default), reduces all dimensions.
boundary_casting (bool) – If
True, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, saves the activations withsummary_layer.
- __init__(pooler_type, axis=None, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, padding_mask=None)#
Apply pooling with optional masking.
- Parameters
inputs (Tensor) – Input tensor.
padding_mask (Tensor) – The padding mask tensor. Assumed to be 1-based, i.e., has
1in the non-padded positions and0elsewhere. If the input tensor is of the shape[d0, d1, ..., d_{k-1}, d_{axis}, d_{k+1}, ... d_n], then thepadding_maskmust have the shape[d0, d1, ..., d_{k-1}, axis]or[d0, d1, ..., d_{k-1}, axis, 1, ..., 1]. IfNone(the default), a padding mask of all 1’s is used.
common.tf.layers.PositionEmbeddingLayer module#
- class common.tf.layers.PositionEmbeddingLayer.PositionEmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerImplementation of the position embedding layer.
Adds positional information to the token embedding provided as input. Supports
'fixed'and'learned'positional embeddings.- Parameters
max_position_embeddings (int) – Maximum sequence length to train using the model. If
None, set to the input sequence length.embedding_type (str) –
Options are
'learned'or'fixed'.Learned: Trainable weights for embeddings.
Fixed: Fixed weights for embeddings.
embeddings_initializer (callable) – Embeddings initializer.
embeddings_regularizer (callable) – Embeddings regularizer.
boundary_casting (bool) – See the documentation for
BaseLayer.tf_summary – See the documentation for
BaseLayer.**kwargs – Additional keyword arguments for
BaseLayer.
- __init__(max_position_embeddings=None, embedding_type='fixed', embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, position_ids=None)#
Add position embeddings to the inputs.
- Parameters
inputs (Tensor) – Input of the size
[batch_size, seq_len, embedding_size].position_ids (Tensor) – Position IDs of the inputs.A 1D tensor of size
seq_len. IfNone(default), assumes that corresponds to[0, 1, ..., seq_len-1].
- setup_fixed_position_embedding(length, channels, min_timescale=1.0, max_timescale=10000.0)#
Adds several sinusoids of different frequencies to a Tensor.
Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.
This allows the attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to the attention.
The use of relative position is possible because
sin(x+y)andcos(x+y)can be expressed in terms ofy,sin(x)andcos(x).In specific, this function uses a geometric sequence of timescales starting with
min_timescaleand ending withmax_timescale. The number of different timescales is equal tochannels / 2. For each timescale, this function generates the two sinusoidal signalssin(timestep/timescale)andcos(timestep/timescale). All these sinusoids are concatenated in the channels dimension.- Parameters
min_timescale (float) –
max_timescale (float) –
- Returns
A tensor of the shape
[length, channels]. Based on _get_timing_signal_1d.- Return type
Tensor
common.tf.layers.PrePostProcessWrapper module#
common.tf.layers.ReshapeLayer module#
- class common.tf.layers.ReshapeLayer.ReshapeLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras layer that reshapes the input.
- __init__(target_shape, boundary_casting=False, tf_summary=False, **kwargs)#
- call(input, **kwargs)#
Apply the reshape layer to an input.
- Parameters
inputs (Tensor) – A tensor.
- Returns
The tensor after reshape.
- Return type
Tensor
common.tf.layers.SegmentEmbeddingLayer module#
- class common.tf.layers.SegmentEmbeddingLayer.SegmentEmbeddingLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerSegment embedding layer. Adds segment information. For example, to which sentence the token belongs when an input sequence contains multiple sentences, such as two in the case of BERT model, to the token embedding provided as input.
- Parameters
num_segments (int) – Number of encoded segments.
embeddings_regularizer (callable) – Embeddings regularizer.
- __init__(num_segments=2, embeddings_initializer='uniform', embeddings_regularizer=None, boundary_casting=False, tf_summary=False, **kwargs)#
- build(input_shape)#
- call(inputs, segment_ids)#
Add segment embedding to inputs.
- Parameters
inputs – Tensor of input embeddings.
segment_ids – Segment IDs.
common.tf.layers.SoftmaxLayer module#
- class common.tf.layers.SoftmaxLayer.SoftmaxLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerWrapper around the Keras softmax layer.
- __init__(axis=- 1, boundary_casting=False, tf_summary=False, **kwargs)#
- call(inputs, **kwargs)#
Performs the softmax.
- Parameters
inputs – Arbitrary tensor.
- Returns
A tensor of the same shape as input.
- Return type
Tensor
common.tf.layers.SquaredErrorLayer module#
- class common.tf.layers.SquaredErrorLayer.SquaredErrorLayer#
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayerSquared error between prediction and labels.
- Parameters
boundary_casting (bool) – If
True, outputs the values in half precision and casts the input values up to full precision.tf_summary (bool) – If
True, saves the activations withsummary_layer.
- __init__(boundary_casting=False, tf_summary=False, **kwargs)#
- call(labels, pred)#
Calculates the squared error between prediction and labels.
- Parameters
labels (Tensor) – Labels.
pred (Tensor) – Predictions (same shape as labels).
- Returns
Loss tensor of the same shape and type as
pred.- Return type
Tensor