`cerebras_pytorch.experimental` package#

Automatic mixed precision#

The following classes and subclasses are designed to facilitate automatic mixed precision on the Cerebras Wafer Scale Cluster

`GradScaler`#

class experimental.amp.GradScaler[source]#

Faciliates mixed precision training and DLS, DLS + GCC

For more details please see docs for amp.initialize.

Parameters

loss_scale – If loss_scale == “dynamic”, then configure dynamic loss scaling. Otherwise, it is the loss scale value used in static loss scaling.
init_scale – The initial loss scale value if loss_scale == “dynamic”
steps_per_increase – The number of steps after which to increase the loss scaling condition
min_loss_scale – The minimum loss scale value that can be chosen by dynamic loss scaling
max_loss_scale – The maximum loss scale value that can be chosen by dynamic loss scaling
overflow_tolerance – The maximum fraction of steps involving infinite or undefined values in the gradient we allow. We reduce the loss scale if the tolerance is exceeded
max_gradient_norm – The maximum gradient norm to use for global gradient clipping Only applies in the DLS + GCC case. If GCC is not enabled, then this parameter has no effect

__init__(loss_scale: Optional[Union[str, float]] = None, init_scale: Optional[float] = None, steps_per_increase: Optional[int] = None, min_loss_scale: Optional[float] = None, max_loss_scale: Optional[float] = None, overflow_tolerance: float = 0.05, max_gradient_norm: Optional[float] = None)[source]#

clip_gradients_and_return_isfinite(optimizers)[source]#: Clip the optimizer’s params’s gradients and return whether or not the norm is finite

get_scale()[source]#: Return the loss scale

load_state_dict(state_dict)[source]#: Loads the state dictionary into the current params

scale(loss: torch.Tensor)[source]#: Scales the loss in preparation of the backwards pass

state_dict(destination=None)[source]#: Returns a dictionary containing the state to be saved to a checkpoint

step(optimizer, *args, **kwargs)[source]#

Step carries out the following two operations: 1. Internally invokes unscale_(optimizer) (unless unscale_ was

explicitly called for optimizer earlier in the iteration). As part of the unscale_, gradients are checked for infs/NaNs.

Invokes optimizer.step() using the unscaled gradients. Ensure that previous optimizer state or params carry over if we encounter NaNs in the gradients.

*args and **kwargs are forwarded to optimizer.step(). Returns the return value of optimizer.step(*args, **kwargs). :param optimizer: Optimizer that applies the gradients. :type optimizer: cerebras_pytorch.optim.Optimizer :param args: Any arguments. :param kwargs: Any keyword arguments.

step_if_finite(optimizer, *args, **kwargs)[source]#

Directly conditionalize the call to optimizer.step(*args, **kwargs) but only if this GradScaler detected finite grads.

Parameters

optimizer (cerebras_pytorch.experimental.optim.Optimizer) – Optimizer that applies the gradients.
args – Any arguments.
kwargs – Any keyword arguments.

Returns

The result of optimizer.step()

unscale_(optimizer)[source]#: Unscales the optimizer’s params gradients inplace

update(new_scale=None)[source]#: Update the gradient scalar after all optimizers have been stepped

update_scale(optimizers)[source]#: Update the scales of the optimizers

warned_unscaling_non_fp32_grad = False#

`optimizer_step`#

experimental.amp.optimizer_step(loss: torch.Tensor, optimizer: cerebras_pytorch.experimental.optim.Optimizer, grad_scaler: cerebras_pytorch.experimental.amp.GradScaler, max_gradient_norm: Optional[float] = None, max_gradient_value: Optional[float] = None)[source]#

Performs loss scaling, gradient scaling and optimizer step

Parameters

loss – The loss value to scale. loss.backward should be called before this function
optimizer – The optimizer to step
grad_scaler – The gradient scaler to use to scale the parameter gradients
max_gradient_norm – the max gradient norm to use for gradient clipping
max_gradient_value – the max gradient value to use for gradient clipping

Creation Ops#

Can be used to lazily initialize tensors with known shape, dtype and value to avoid have them unnecessarily take up memory.

`full`#

`full_like`#

`ones`#

`ones_like`#

`zeros`#

`zeros_like`#

Checkpoint Saving/Loading utilities#

Data Utilities#

`utils.data.DataLoader`#

class experimental.utils.data.DataLoader[source]#

Wrapper around torch.utils.data.DataLoader that facilitates moving data generated by the dataloader to a Cerebras system

Parameters

input_fn – A callable that returns a torch.utils.data.DataLoader instance
*args – Any other positional or keyword arguments are passed into the input_fn when each worker instantiates their respective dataloaders
**kwargs –
Any other positional or keyword arguments are passed into the input_fn when each worker instantiates their respective dataloaders

__init__(input_fn: Callable[[...], torch.utils.data.DataLoader], *args, **kwargs)[source]#

`utils.data.SyntheticDataset`#

class experimental.utils.data.SyntheticDataset[source]#

A synthetic dataset that generates samples from a SampleSpec.

Constructs a SyntheticDataset instance.

A synthetic dataset can be used to generate samples on the fly with an expected dtype/shape but without needing to create a full-blown dataset. This is especially useful for compile validation.

Parameters

sample_spec –
Specification of the samples to generate. This can be a nested structure of one of the following types:
- torch.Tensor: A tensor to be cloned.
- Callable: A callable that takes the sample index and
  returns a tensor.
Supported data structures for holding the above leaf nodes are list, tuple, dict, OrderedDict, and NamedTuple.
num_samples – Total size of the dataset. If None, the dataset will generate samples indefinitely.

__init__(sample_spec: Union[torch.Tensor, Callable[[int], torch.Tensor], List[Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], Tuple[Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple], ...], Dict[str, Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], OrderedDict[str, Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], NamedTuple], num_samples: Optional[int] = None)[source]#

Constructs a SyntheticDataset instance.

A synthetic dataset can be used to generate samples on the fly with an expected dtype/shape but without needing to create a full-blown dataset. This is especially useful for compile validation.

Parameters

sample_spec –
Specification of the samples to generate. This can be a nested structure of one of the following types:
- torch.Tensor: A tensor to be cloned.
- Callable: A callable that takes the sample index and
  returns a tensor.
Supported data structures for holding the above leaf nodes are list, tuple, dict, OrderedDict, and NamedTuple.
num_samples – Total size of the dataset. If None, the dataset will generate samples indefinitely.

`utils.data.DataExecutor`#

class experimental.utils.data.DataExecutor#

Defines a single execution run on a Cerebras wafer scale cluster

Parameters

dataloader – the dataloader to use for the run
num_steps – the number of steps to run. Defaults to 1 if the backend was configured for compile or validate only
checkpoint_steps – the interval at which to schedule fetching checkpoints from the cluster
cs_config – optionally, a csconfig object can be passed in to configure the cerebras wafer-scale cluster. if none provided the default configuration values will be used.
writer – The summary writer to be used to write any summarized scalars or tensors to tensorboard
profiler_activities – The list of activities to profile By default the client side rate and global rate are tracked

__init__(*args: Any, **kwargs: Any) → None#

`utils.CSConfig`#

class experimental.utils.CSConfig#

Contains config details for the Cerebras Wafer Scale Cluster

Parameters

mgmt_address (Optional[str]) – Address to connect to appliance. If not provided, query the cluster management node for it. Default: None.
credentials_path (Optional[str]) – Credentials for connecting to appliance. If not provided, query the cluster management node for it. Default: None.
num_csx (int) – Number of Cerebras Systems to run on. Default: 1.
max_wgt_servers (int) – Number of weight servers to support run. Default: 24.
max_act_per_csx (int) – Number of activation servers per system. Default: 1.
num_workers_per_csx (int) – Number of streaming workers per system. Default: 1.
transfer_processes (int) – Number of processes to transfer data to/from appliance. Default: 5.
job_time_sec (int) – Time limit for the appliance jobs, not including the queue time. Default: None.
mount_dirs (List[str]) – Local storage to mount to appliance (ex. training data). Default: None.
python_paths (List[str]) – A list of path that worker pods respect as PYTHONPATH in addition to the PYTHONPATH set in the container image. Default: None.
job_labels (List[str]) – A list of equal-sign-separated key-value pairs that get applied as part of job metadata. Default: None.
debug_args (DebugArgs) – Optional debugging arguments object. Default: None.
precision_opt_level (int) – The precision optimization level. Default: 1.

Metrics#

A collection of evaluation metrics that can be used to evaluate the performance of a trained model on the Cerebras Wafer Scale Cluster.

`metrics.Metric`#

class experimental.metrics.Metric[source]#

The abstract basemetric class

__init__(name)[source]#

abstract compute() → float[source]#: Compute and return the final metric value

forward(*args, **kwargs)[source]#: Updates the metric value

register_output(name: str)[source]#

Create and register a new property with provided name that handles fetching the tensor value when assigning to the property

Note, this means that only tensors are allowed to be set for these properties

Parameters: name – the name of the property

register_state(name: str, value: torch.Tensor)[source]#: Registers a state variable to the module

registry = {}#

abstract reset()[source]#: Reset the metric state

abstract update(*args, **kwargs)[source]#: Update the metric value

`metrics.AccuracyMetric`#

class experimental.metrics.AccuracyMetric[source]#

Computes the accuracy of the model’s predictions

Parameters: name – Name of the metric

reset()[source]#

update(labels, predictions, weights=None, dtype=None)[source]#

`metrics.PerplexityMetric`#

class experimental.metrics.PerplexityMetric[source]#

Computes the perplexity of the model’s predictions

Parameters: name – Name of the metric

reset()[source]#

update(labels, loss, weights=None, dtype=None)[source]#

`metrics.compute_all_metrics`#

experimental.metrics.compute_all_metrics()[source]#: Compute the floating point value of all registered metrics

Random Number Generation utilities#

`numpy` utilities#

`from_numpy`#

`to_numpy`#

Tensorboard utilities#

experimental.utils.tensorboard.SummaryWriter(*args, base_step: int = 1, **kwargs)#

Thin wrapper around torch.utils.tensorboard.SummaryWriter

Additional features include the ability to add a tensor summary

Parameters

base_step – The base step to use in summarize_{scalar,tensor} functions
*args – Any other positional and keyword arguments are forwarded directly to the base class
**kwargs –
Any other positional and keyword arguments are forwarded directly to the base class

experimental.utils.tensorboard.SummaryReader(log_dir: str, **kwargs)#: Class for reading summaries saved using the SummaryWriter

(Early access) Port your code using Cerebras PyTorch API

Optimizer package in PyTorch API 2.0

cerebras_pytorch.experimental package#

Automatic mixed precision#

GradScaler#

optimizer_step#

Backend related functions#

backend#

current_backend#

current_torch_device#

use_cs#

Compile related functions#

compile#

compile_step#

Creation Ops#

full#

full_like#

ones#

ones_like#

zeros#

zeros_like#

Checkpoint Saving/Loading utilities#

Data Utilities#

utils.data.DataLoader#

utils.data.SyntheticDataset#

utils.data.DataExecutor#

utils.CSConfig#

Metrics#

metrics.Metric#

metrics.AccuracyMetric#

metrics.PerplexityMetric#

metrics.compute_all_metrics#

Random Number Generation utilities#

numpy utilities#

from_numpy#

to_numpy#

Step Closure related utilities#

step_closure#

checkpoint_closure#

Tensorboard utilities#

`cerebras_pytorch.experimental` package#

`GradScaler`#

`optimizer_step`#

`backend`#

`current_backend`#

`current_torch_device`#

`use_cs`#

`compile`#

`compile_step`#

`full`#

`full_like`#

`ones`#

`ones_like`#

`zeros`#

`zeros_like`#

`utils.data.DataLoader`#

`utils.data.SyntheticDataset`#

`utils.data.DataExecutor`#

`utils.CSConfig`#

`metrics.Metric`#

`metrics.AccuracyMetric`#

`metrics.PerplexityMetric`#

`metrics.compute_all_metrics`#

`numpy` utilities#

`from_numpy`#

`to_numpy`#

`step_closure`#

`checkpoint_closure`#