Cerebras PyTorch Layer API¶

Cerebras PyTorch Layer API implements a subset of PyTorch APIs with our custom implementation that takes advantage of our high-performance kernels and provides extra functionalities as compared to the native PyTorch version. The extra functionalities are optional and opt-in; if you don’t use the extra functionalities, then the Layer API is equivalent to the native PyTorch version.

modelzoo.common.pytorch.layers.MultiheadAttention is the replacement for torch.nn.MultiheadAttention
modelzoo.common.pytorch.layers.TransformerDecoderLayer is the replacement for torch.nn.TransformerDecoderLayer
modelzoo.common.pytorch.layers.TransformerDecoder is the replacement for torch.nn.TransformerDecoder
modelzoo.common.pytorch.layers.TransformerEncoderLayer is the replacement for torch.nn.TransformerEncoderLayer
modelzoo.common.pytorch.layers.TransformerEncoder is the replacement for torch.nn.TransformerEncoder

Note

Cerebras has moved away from huggingface model implementations in favor for our own PyTorch layer API. One of the many benefits of using our PyTorch layer API is that it is designed to be (near) drop-in compatible with the transformer layers that are included in PyTorch. It is not possible (at least for T5 and Transformer) to maintain the same naming scheme in the migrated model as in the original.

Supported PyTorch Optimizers¶

Cerebras PyTorch Optimizers implement most PyTorch optimizers under torch.optim namespace as drop-in replacement with the exact semantic. Our implementation take advantage of our hardware capabilities and support fallback on GPU or CPU depend on the target device.

Supported optimizers:

SGD

SGDM

rmsprop

adadelta

Lamb

radam

adamax

adafactor

adagrad

adam

adamw

asgd

nadam

rprop

Supported PyTorch Ops¶

If your model implementation requires additional PyTorch Ops beyond the layer APIs above, Cerebras also supports the following PyTorch operations.

Attention

The following list of supported PyTorch ops is very preliminary. We cannot guarantee that mixing and matching them in your models will work. Support is only provided for the way they are used in the Cerebras Model Zoo.

nn¶

torch.nn.BCEWithLogitsLoss
torch.nn.CrossEntropyLoss
Note: Known limitation: ignore_index can only be -100
torch.nn.Dropout
torch.nn.Embedding
Note: Known limitation: num_embeddings < 65536
torch.nn.functional.dropout
torch.nn.functional.gelu
Note: Known limitation: May have precision issue when approximation !=tanh
torch.nn.functional.pad
torch.nn.functional.softmax
torch.nn.LayerNorm
torch.nn.Linear
torch.nn.MSELoss
torch.nn.NLLLoss
Note: Known limitation: ignore_index can only be -100
torch.nn.ReLU
torch.nn.Softmax
torch.nn.TanH

Software Documentation (Version 1.7.0)

Cerebras PyTorch Layer API

On This Page

Cerebras PyTorch Layer API¶

Supported PyTorch Optimizers¶

Supported PyTorch Ops¶

nn¶

Functional¶

Other ops¶

Layers¶