cerebras_pytorch.experimental
package#
Automatic mixed precision#
The following classes and subclasses are designed to facilitate automatic mixed precision on the Cerebras Wafer Scale Cluster
GradScaler
#
- class experimental.amp.GradScaler[source]#
Faciliates mixed precision training and DLS, DLS + GCC
For more details please see docs for amp.initialize.
- Parameters
loss_scale – If loss_scale == “dynamic”, then configure dynamic loss scaling. Otherwise, it is the loss scale value used in static loss scaling.
init_scale – The initial loss scale value if loss_scale == “dynamic”
steps_per_increase – The number of steps after which to increase the loss scaling condition
min_loss_scale – The minimum loss scale value that can be chosen by dynamic loss scaling
max_loss_scale – The maximum loss scale value that can be chosen by dynamic loss scaling
overflow_tolerance – The maximum fraction of steps involving infinite or undefined values in the gradient we allow. We reduce the loss scale if the tolerance is exceeded
max_gradient_norm – The maximum gradient norm to use for global gradient clipping Only applies in the DLS + GCC case. If GCC is not enabled, then this parameter has no effect
- __init__(loss_scale: Optional[Union[str, float]] = None, init_scale: Optional[float] = None, steps_per_increase: Optional[int] = None, min_loss_scale: Optional[float] = None, max_loss_scale: Optional[float] = None, overflow_tolerance: float = 0.05, max_gradient_norm: Optional[float] = None)[source]#
- clip_gradients_and_return_isfinite(optimizers)[source]#
Clip the optimizer’s params’s gradients and return whether or not the norm is finite
- state_dict(destination=None)[source]#
Returns a dictionary containing the state to be saved to a checkpoint
- step(optimizer, *args, **kwargs)[source]#
Step carries out the following two operations: 1. Internally invokes
unscale_(optimizer)
(unless unscale_ wasexplicitly called for
optimizer
earlier in the iteration). As part of the unscale_, gradients are checked for infs/NaNs.Invokes
optimizer.step()
using the unscaled gradients. Ensure that previous optimizer state or params carry over if we encounter NaNs in the gradients.
*args
and**kwargs
are forwarded tooptimizer.step()
. Returns the return value ofoptimizer.step(*args, **kwargs)
. :param optimizer: Optimizer that applies the gradients. :type optimizer: cerebras_pytorch.optim.Optimizer :param args: Any arguments. :param kwargs: Any keyword arguments.
- step_if_finite(optimizer, *args, **kwargs)[source]#
Directly conditionalize the call to optimizer.step(*args, **kwargs) but only if this GradScaler detected finite grads.
- Parameters
optimizer (cerebras_pytorch.experimental.optim.Optimizer) – Optimizer that applies the gradients.
args – Any arguments.
kwargs – Any keyword arguments.
- Returns
The result of optimizer.step()
- warned_unscaling_non_fp32_grad = False#
optimizer_step
#
- experimental.amp.optimizer_step(loss: torch.Tensor, optimizer: cerebras_pytorch.experimental.optim.Optimizer, grad_scaler: cerebras_pytorch.experimental.amp.GradScaler, max_gradient_norm: Optional[float] = None, max_gradient_value: Optional[float] = None)[source]#
Performs loss scaling, gradient scaling and optimizer step
- Parameters
loss – The loss value to scale. loss.backward should be called before this function
optimizer – The optimizer to step
grad_scaler – The gradient scaler to use to scale the parameter gradients
max_gradient_norm – the max gradient norm to use for gradient clipping
max_gradient_value – the max gradient value to use for gradient clipping
Creation Ops#
Can be used to lazily initialize tensors with known shape, dtype and value to avoid have them unnecessarily take up memory.
full
#
full_like
#
ones
#
ones_like
#
zeros
#
zeros_like
#
Checkpoint Saving/Loading utilities#
Data Utilities#
utils.data.DataLoader
#
- class experimental.utils.data.DataLoader[source]#
Wrapper around torch.utils.data.DataLoader that facilitates moving data generated by the dataloader to a Cerebras system
- Parameters
input_fn – A callable that returns a torch.utils.data.DataLoader instance
*args – Any other positional or keyword arguments are passed into the input_fn when each worker instantiates their respective dataloaders
**kwargs –
Any other positional or keyword arguments are passed into the input_fn when each worker instantiates their respective dataloaders
utils.data.SyntheticDataset
#
- class experimental.utils.data.SyntheticDataset[source]#
A synthetic dataset that generates samples from a SampleSpec.
Constructs a SyntheticDataset instance.
A synthetic dataset can be used to generate samples on the fly with an expected dtype/shape but without needing to create a full-blown dataset. This is especially useful for compile validation.
- Parameters
sample_spec –
Specification of the samples to generate. This can be a nested structure of one of the following types:
torch.Tensor: A tensor to be cloned.
- Callable: A callable that takes the sample index and
returns a tensor.
Supported data structures for holding the above leaf nodes are list, tuple, dict, OrderedDict, and NamedTuple.
num_samples – Total size of the dataset. If None, the dataset will generate samples indefinitely.
- __init__(sample_spec: Union[torch.Tensor, Callable[[int], torch.Tensor], List[Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], Tuple[Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple], ...], Dict[str, Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], OrderedDict[str, Union[torch.Tensor, Callable[[int], torch.Tensor], List[SampleSpecT], Tuple[SampleSpecT, ...], Dict[str, SampleSpecT], OrderedDict[str, SampleSpecT], NamedTuple]], NamedTuple], num_samples: Optional[int] = None)[source]#
Constructs a SyntheticDataset instance.
A synthetic dataset can be used to generate samples on the fly with an expected dtype/shape but without needing to create a full-blown dataset. This is especially useful for compile validation.
- Parameters
sample_spec –
Specification of the samples to generate. This can be a nested structure of one of the following types:
torch.Tensor: A tensor to be cloned.
- Callable: A callable that takes the sample index and
returns a tensor.
Supported data structures for holding the above leaf nodes are list, tuple, dict, OrderedDict, and NamedTuple.
num_samples – Total size of the dataset. If None, the dataset will generate samples indefinitely.
utils.data.DataExecutor
#
- class experimental.utils.data.DataExecutor#
Defines a single execution run on a Cerebras wafer scale cluster
- Parameters
dataloader – the dataloader to use for the run
num_steps – the number of steps to run. Defaults to 1 if the backend was configured for compile or validate only
checkpoint_steps – the interval at which to schedule fetching checkpoints from the cluster
cs_config – optionally, a csconfig object can be passed in to configure the cerebras wafer-scale cluster. if none provided the default configuration values will be used.
writer – The summary writer to be used to write any summarized scalars or tensors to tensorboard
profiler_activities – The list of activities to profile By default the client side rate and global rate are tracked
- __init__(*args: Any, **kwargs: Any) None #
utils.CSConfig
#
- class experimental.utils.CSConfig#
Contains config details for the Cerebras Wafer Scale Cluster
- Parameters
mgmt_address (Optional[str]) – Address to connect to appliance. If not provided, query the cluster management node for it. Default:
None
.credentials_path (Optional[str]) – Credentials for connecting to appliance. If not provided, query the cluster management node for it. Default:
None
.num_csx (int) – Number of Cerebras Systems to run on. Default:
1
.max_wgt_servers (int) – Number of weight servers to support run. Default:
24
.max_act_per_csx (int) – Number of activation servers per system. Default:
1
.num_workers_per_csx (int) – Number of streaming workers per system. Default:
1
.transfer_processes (int) – Number of processes to transfer data to/from appliance. Default:
5
.job_time_sec (int) – Time limit for the appliance jobs, not including the queue time. Default:
None
.mount_dirs (List[str]) – Local storage to mount to appliance (ex. training data). Default:
None
.python_paths (List[str]) – A list of path that worker pods respect as PYTHONPATH in addition to the PYTHONPATH set in the container image. Default:
None
.job_labels (List[str]) – A list of equal-sign-separated key-value pairs that get applied as part of job metadata. Default:
None
.debug_args (DebugArgs) – Optional debugging arguments object. Default:
None
.precision_opt_level (int) – The precision optimization level. Default:
1
.
Metrics#
A collection of evaluation metrics that can be used to evaluate the performance of a trained model on the Cerebras Wafer Scale Cluster.
metrics.Metric
#
- class experimental.metrics.Metric[source]#
The abstract basemetric class
- register_output(name: str)[source]#
Create and register a new property with provided name that handles fetching the tensor value when assigning to the property
Note, this means that only tensors are allowed to be set for these properties
- Parameters
name – the name of the property
- registry = {}#
metrics.AccuracyMetric
#
metrics.PerplexityMetric
#
metrics.compute_all_metrics
#
Random Number Generation utilities#
numpy
utilities#
from_numpy
#
to_numpy
#
Tensorboard utilities#
- experimental.utils.tensorboard.SummaryWriter(*args, base_step: int = 1, **kwargs)#
Thin wrapper around torch.utils.tensorboard.SummaryWriter
Additional features include the ability to add a tensor summary
- Parameters
base_step – The base step to use in summarize_{scalar,tensor} functions
*args – Any other positional and keyword arguments are forwarded directly to the base class
**kwargs –
Any other positional and keyword arguments are forwarded directly to the base class
- experimental.utils.tensorboard.SummaryReader(log_dir: str, **kwargs)#
Class for reading summaries saved using the SummaryWriter