TensorFlow: Getting Started¶

This quick-start guide describes first-time user setup and user workflow for running TensorFlow jobs on a Cerebras Wafer-Scale Cluster. Cerebras Wafer-Scale Cluster is composed of CS-2 system(s), MemoryX, SwarmX, management and input worker nodes. The cluster supports two execution modes to enable ML models of different sizes:

Layer pipelined: In this mode, all the layers of the network are loaded together onto the Cerebras WSE. This mode is selected for neural network models that fit entirely on the WSE, approximately up to 1B parameters.

Weight streaming: In this mode, one layer of the neural network model is loaded at a time. This layer-by-layer mode is used to run extremely large models (>1B parameters).

Perform the following steps to run your TensorFlow jobs on the Wafer-Scale Cluster:

Ensure that the admin setup is complete. See Admin setup checklist.

Follow the first-time user setup procedure for TensorFlow below. This includes creating and configuring your virtual environment. This step should be done only once.

Wafer-Scale Clusters currently adopt different workflows to launch jobs for Pipelined execution and Weight Streaming execution.

To run large models of >1 billion parameters in TensorFlow with Weight Streaming execution, follow the steps provided in Running Large Models (Weight Streaming Execution).

To run small to medium models of <1 billion parameters in TensorFlow with Pipelined execution, follow the steps provided in Running Small to Medium Models (Pipelined Execution).

Note

If you are on the Original Cerebras Installation and have not upgraded to the Wafer-Scale Cluster, you can still use Slurm-based workflow to launch jobs for small to medium models with Pipelined execution. Large models with Weight Streaming execution are not supported on the Original Cerebras Installation. To get started on the Original Cerebras Installaion, see TensorFlow: Getting Started.

If you are ready to start developing / adapting your own TensorFlow code for CS System

Skip to Workflow for TensorFlow on CS for an in-depth development guide using TensorFlow for Cerebras.

First-time user setup for TensorFlow¶

The first time you use Wafer-Scale Cluster for your TensorFlow runs, you must set up a virtual environment as shown below.

Note

Make sure that you have the TLS Certificate available from your sysadmin. You need this to communicate between the user node and the Cerebras Wafer-Scale Cluster. Your admin will have shared the path to this file during the setup.

Set up the Python virtual environment using Python 3.7. Create the environment named venv_cerebras_tf using the following command:
python3.7 -m venv venv_cerebras_tf
Cerebras provides three main packages to set up virtual environments: cerebras_appliance software package, the cerebras_tensorflow package if you are using TensorFlow, and the cerebras_pytorch package if you are using PyTorch. To set up your TensorFlow environment, you need two out of these three packages. Enter the following commands on the user node to install the required packages. Make sure to execute the commands in this order to install the appliance wheel first.
source venv_cerebras_tf/bin/activate pip install <path_to_wheels>/cerebras_appliance-<Cerebras release version>_<date>_<build>_<commit>-py3-none-any.whl --find-links=<path_to_wheels> pip install <path_to_wheels>/cerebras_tensorflow-<Cerebras release version>_<date>_<build>_<commit>-py3-none-any.whl --find-links=<path_to_wheels>

Note

With the find-links command, it finds the correct cerebras-appliance version if you place all the wheels in the same directory. If you are only using Pipelined execution mode, then you shouldn’t need the cerebras_tensorflow package and can install cerebras_appliance package only.

Note

In the cerebras_appliance wheel, there exists two scripts: csrun_cpu and csrun_wse. These scripts are required for Pipelined execution and serve the same function as the scripts previously available for Slurm. The csrun_cpu is for non-WSE jobs while the csrun_wse is to launch jobs on the Wafer-Scale Engine. For more information, see csrun_cpu and csrun_wse.

Running TensorFlow jobs¶

After you have completed first-time user setup:

To run large models of >1 billion parameters in Weight Streaming execution, follow steps provided in Running Large Models (Weight Streaming Execution).

To run small to medium models of <1 billion parameters in Pipelinedexecution, follow steps provided in Running Small to Medium Models (Pipelined Execution).

Software Documentation (Version 1.7.0)

TensorFlow: Getting Started

On This Page

TensorFlow: Getting Started¶

First-time user setup for TensorFlow¶

Running TensorFlow jobs¶