modelzoo.common.pytorch.layers.TransformerEncoderLayer
modelzoo.common.pytorch.layers.TransformerEncoderLayer¶
import path: modelzoo.common.pytorch.layers.TransformerEncoderLayer
TransformerEncoderLayer (d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=”gelu, layer_norm_eps=1e-05, batch_first=True, norm_first=False, device=None, attention_dropout_rate=None, attention_type=”scaled_dot_product”, use_projection_bias_in_attention=False, use_ffn_bias_in_attention=False, use_ffn_bias=False, attention_initializer=”xavier_uniform”, ffn_initializer=”xavier_uniform”):
d_model: the number of expected features in the input (required).
nhead: the number of heads in the multihead attention models (required).
dim_feedforward: the dimension of the feedforward network model (
default=2048).dropout: the dropout value (
default=0.1).activation: the activation function of the intermediate layer, can be a string (
reluorgelu) or a unary callable. Default:gelu.layer_norm_eps: the eps value in layer normalization components (
default=1e-5).batch_first: If
True, then the input and output tensors are provided as (batch,seq,feature). Default:False(seq,batch,feature). We only supportbatch_first = Truenow.norm_first: if
True, layer norm is done prior to attention and feedforward operations, respectively. Otherwise it’s done after.Default: False(after).attention_dropout_rate: Attention dropout rate. If
None, defaults to dropout.attention_type: Should be in [
scaled_dot_product,dot_product]use_projection_bias_in_attention: Add bias to Q, K, V projections in the Attention layer. Defaults to
False.use_ffn_bias_in_attention: Add bias in the concluding FFN in the Attention layer. Defaults to
False.use_ffn_bias: Add bias in all dense layers of the decoder’s ffn sublayer.
attention_initializer: Attention layer initializer. Defaults to
xavier_uniform.ffn_initializer: FFN layer initializer. Defaults to
xavier_uniform.
forward (src, mask=None, src_key_padding_mask=None):
src (Tensor): the sequence to the encoder layer (required). shape
[batch_size, src_seq_length, embed_dim].mask (Tensor): the mask for the src sequence (optional). shape
[src_seq_length, src_seq_length].src_key_padding_mask (Tensor): the mask for the src keys per batch (optional). shape
[batch_size, src_seq_length].