.. _cs-tf-pl-k8s:


Pipeline K8s Workflow 
=====================

The Cerebras-recommended workflow uses Kubernetes (K8s) as the orchestrating software to manage resources and coordinate communication between the CS system and the other components within the Cerebras Wafer-Scale Cluster. This guide helps you get started with running pipelined execution on the cluster with the K8s workflow. 

Prerequisites
-------------

To get started with running pipeline on the cluster, the Sysadmin setup is not different from what can be set up for weight streaming. For more information, refer to the Admin Setup and First-time User Setup sections in :ref:`cs-tf-ws-appliance-mode`. One additional requirement is the Sysadmin populates a ``.yaml`` file with the default distribution of resources to be used. Confirm with your Sysadmin whether this step is completed.  

.. Note::
  
  You can now run both pipeline and weight streaming without needing to go through a system reboot.
  
On the user side, you must set up a Python virtual environment to use this flow. There are three different sets of Python libraries. There is the base Cerebras appliance python wheel. On top of this, there is a wheel for TensorFlow and a wheel for PyTorch. However, to run Pipeline with K8s, you only need to install the Cerebras appliance Python wheel. In the wheel, there exists two scripts: ``csrun_cpu`` and ``csrun_wse``. These scripts serve the same function as the scripts previously available for Slurm workflow (if you have been using that so far). The ``csrun_cpu`` is for non-Wafer-Scale Engine jobs and is used to run offline on CPUs, while the ``csrun_wse`` is for jobs that utilize the Wafer-Scale Engine / CS-2 system.

Clone the reference samples
---------------------------

  1. Log in to your Wafer-Scale Cluster.
  
  2. Activate the virtual environment. This exposes the commands used below.
  
      .. code-block:: bash

            source venv_appliance/bin/activate
            
  3. Clone the reference samples repository to your preferred location in your home directory. 
  
       .. code-block:: bash

            git clone https://github.com/Cerebras/modelzoo
            
Compile on CPU
--------------
 
Cerebras recommends that you first compile your model successfully on a CPU node from the cluster before running it on the CS system. 

  - You can run in ``validate_only`` mode that runs a fast, light-weight verification. In this mode, the compilation only runs through the first few stages, up until kernel library matching. 

  - After a successful ``validate_only`` run, you can run full compilation with ``compile_only`` mode. 

This section of the quick-start guide shows how to execute these steps on a CPU node. 

.. Tip::
  
  The ``validate_only`` step is very fast, enabling you to rapidly iterate on your model code. Without needing access to the CS system wafer scale engine, you can determine in this ``validate_only`` step if you are using any TensorFlow layer or functionality that is unsupported by either XLA or CGC. 


Follow these steps to compile on a CPU (uses FC-MNIST example from the `Cerebras Model Zoo git repository <https://github.com/Cerebras/modelzoo>`_).

  1. Navigate to the model directory
  
       .. code-block:: bash

            cd modelzoo/fc_mnist/tf/ 
            
  2. Run the compilation in ``validate_only`` mode. 
    
       .. code-block:: bash

            csrun_cpu python run.py --mode train --validate_only 
            ... 
            XLA Extraction Complete 
            =============== Starting Cerebras Compilation =============== 
            Cerebras compilation completed: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02s,  1.23s/stages] 
            =============== Cerebras Compilation Completed ===============        
  
    .. Note::
  
      The ``validate_only`` mode checks the kernel compatibility of your model. When your model passes this mode, run the full compilation with ``compile_only`` to generate the CS system executable. 
      
  3. Run the full compilation process in ``compile_only`` mode. This step runs the full compilation through all stages of the Cerebras software stack to generate a CS system executable. 
      
       .. code-block:: bash

            csrun_cpu python run.py --mode train --compile_only --cs_ip <specify your CS_IP> 
            ... 
            XLA Extraction Complete 
            =============== Starting Cerebras Compilation =============== 
            Cerebras compilation completed: |                    | 17/? [00:18s,  1.09s/stages] 
            =============== Cerebras Compilation Completed =============== 
            
When the above compilation is successful, the model is guaranteed to run on the CS system. You can also use validate-only mode to run pre-compilations of many different model configurations offline so you can more fully utilize the allotted CS system cluster time. 
  
    .. Note::
  
      The compiler detects whether a binary already exists for a particular model config and skips compiling on the fly during training if it detects one. 
      
Train and evaluate on CPU 
-------------------------

You can run training and eval on CPU as well without any code changes before running on the CS System. This capability depends on the size of the model and params used.  

To run, train, and eval on CPU follow these steps: 

  1. Navigate to the model directory.
        
       .. code-block:: bash

            cd modelzoo/fc_mnist/tf/ 
            
  2. Train and evaluate the model on the CPU.
          
       .. code-block:: bash

            # train on CPU 
            csrun_cpu python run.py --mode train \ 
            -–params=params.yaml 
            
            # run eval on CPU 
            csrun_cpu python run.py --mode eval \ 
            --eval_steps 1000 
  
  
Run the model on the CS system 
------------------------------

The below ``csrun_wse`` command compiles the code if no existing compile artifacts are found, and then runs the compiled executable on the CS system. 

     .. code-block:: bash
     
        csrun_wse --admin-defaults="/path/to/admin-defaults.yaml" --mount-dirs="/data/ml,/lab/ml" python run.py --cs_ip=<cs-ip> --mode=train --params=params.yaml 

The command above mounts the directories ``/data/ml`` and ``/lab/ml`` to the container (in addition to the default mount directories) and then trains the FC-MNIST model on the CS System available at the provided IP address ``<cs-ip>``. 

Exact options are available using ``csrun_wse --help``. 

To run an eval job on the CS system, enter the following command: 

     .. code-block:: bash
     
        csrun_wse --mount-dirs=”/data/ml,/lab/ml" python run.py  --mode=eval –eval_steps=1000 --cs_ip=<cs-ip> 

 This command initiates an eval job for 1000 steps on the CS system at the given ``<cs-ip>`` IP address. 

Output files and artifacts
--------------------------

 The output files and artifacts include a model directory (``model_dir``), which contains all the results and artifacts of the latest run, including: 

  - Compile directory (``cs_<checksum>``) 

  - ``performance.json`` file 

  - Checkpoints 

  - Tensorboard event files 

  - ``yaml`` files 
  
Compile dir – The directory containing the ``cs_<checksum>`` 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
The ``cs_<checksum>`` dir (also known as cached compile directory), contains the ``.elf``, which is used to program the system. 

Output of compilation indicates whether the compile passed or failed; if failed, then the logs show at which stage compilation failed. 

``performance.json`` file and its parameters 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
There is a performance directory that should contain the ``performance.json <model_dir>/performance/performance.json``. This contains information as listed below: 

  - ``compile_time`` - The amount of time that it took to compile the model to generate the Cerebras executable. 

  - ``est_samples_per_sec`` - The estimated performance in terms of samples per second based on the Cerebras compile. Note that this number is theoretical and actual performance may vary. 

  - ``programming_time`` - This is the time taken to prepare the system and load with the model that is compiled. 

  - ``samples_per_sec`` - The actual performance of your run execution; i.e., the number of samples processed on the WST per second. 

  - ``suspected_input_bottleneck`` - This is a beta feature. It indicates whether you are input-starved and need more input workers to feed the Cerebras system. 

  - ``total_samples`` - The total gross samples that were iterated during the execution. 

  - ``total_time`` - The total time it took to complete the total samples. 
  
Checkpoints
~~~~~~~~~~~

Checkpoints are stored in ``<model_dir>``; for example, ``<model_dir>/model-ckpt-0.index``, ``<model_dir>/model-ckpt-0.meta``, and ``<model_dir>/model-ckpt-1.data-00000-of-00001``. They are saved with the frequency specified in the ``runconfig`` file. 

Tensorboard event files 
~~~~~~~~~~~~~~~~~~~~~~~
 
Ternsorboard event files are also stored in the ``<model_dir>``. 

``yaml`` files content after the run 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
The ``yaml`` file is stored in the train directory. This ``yaml`` file contains information about the specifics of the run, such as model specific configuration (e.g., ``dropout``, ``activation_fn``), optimizer type and optimizer parameters, input data configuration, such as ``batch_size``, and shuffle and run configuration, such as ``max_steps``, ``checkpoint_steps``, and ``num_epochs.``