tf.layers.PositionEmbeddingLayer module
tf.layers.PositionEmbeddingLayer module¶
- class tf.layers.PositionEmbeddingLayer.PositionEmbeddingLayer(*args: Any, **kwargs: Any)¶
Bases:
modelzoo.common.tf.layers.BaseLayer.BaseLayer
Implementation of the position embedding layer.
Adds positional information to the token embedding provided as input. Supports
'fixed'
and'learned'
positional embeddings.- Parameters
max_position_embeddings (int) – Maximum sequence length to train using the model. If
None
, set to the input sequence length.embedding_type (str) –
Options are
'learned'
or'fixed'
.Learned: Trainable weights for embeddings.
Fixed: Fixed weights for embeddings.
embeddings_initializer (callable) – Embeddings initializer.
embeddings_regularizer (callable) – Embeddings regularizer.
boundary_casting (bool) – See the documentation for
BaseLayer
.tf_summary – See the documentation for
BaseLayer
.**kwargs – Additional keyword arguments for
BaseLayer
.
- build(input_shape)¶
- call(inputs, position_ids=None)¶
Add position embeddings to the inputs.
- Parameters
inputs (Tensor) – Input of the size
[batch_size, seq_len, embedding_size]
.position_ids (Tensor) – Position IDs of the inputs.A 1D tensor of size
seq_len
. IfNone
(default), assumes that corresponds to[0, 1, ..., seq_len-1]
.
- setup_fixed_position_embedding(length, channels, min_timescale=1.0, max_timescale=10000.0)¶
Adds several sinusoids of different frequencies to a Tensor.
Each channel of the input Tensor is incremented by a sinusoid of a different frequency and phase.
This allows the attention to learn to use absolute and relative positions. Timing signals should be added to some precursors of both the query and the memory inputs to the attention.
The use of relative position is possible because
sin(x+y)
andcos(x+y)
can be expressed in terms ofy
,sin(x)
andcos(x)
.In specific, this function uses a geometric sequence of timescales starting with
min_timescale
and ending withmax_timescale
. The number of different timescales is equal tochannels / 2
. For each timescale, this function generates the two sinusoidal signalssin(timestep/timescale)
andcos(timestep/timescale)
. All these sinusoids are concatenated in the channels dimension.- Parameters
min_timescale (float) –
max_timescale (float) –
- Returns
A tensor of the shape
[length, channels]
. Based on _get_timing_signal_1d.- Return type
Tensor