Optimizer and Scheduler#
On this page, you will learn about how to add and configure the
Trainer with a Optimizer
and with one or more Scheduler classes.
By the end you should have a cursory understanding on how to use the
Optimizer class and Scheduler
class in conjunction with the Trainer class.
Prerequisites#
Basic Usage#
An Optimizer implements an optimization algorithm
to control how model parameters are updated. Various hyperparameters such as lr,
momentum, and weight_decay can be passed to the Optimizer
to give further control. A Scheduler is
used in conjunction with an Optimizer to adjust the
value of these hyperparameters over the course of a run. Currently, schedulers for
lr and weight_decay are supported.
The Trainer takes in an optimizer argument.
An optimizer is used to optimize model weights during training and is required
for any run that does any training. optimizer can be passed as an
Optimizer class. For details on all available
optimizers, see the CSTorch optimizer class.
The Trainer also accepts a schedulers argument.
Schedulers are used to adjust hyperparameters during training. Typically this
adjustment is a decay following some algorithm. The CSTorch API supports
schedulers that adjust either learning rate or weight decay. For a full list of
available schedulers see CSTorch scheduler class.
In the example below, you create an SGD optimizer with a single SequentialLR Scheduler that is a LinearLR Scheduler for the first 500 steps, then a CosineDecayLR Scheduler for the next 500 steps.
trainer:
init:
optimizer:
# Corresponds to cstorch.optim.SGD
SGD:
lr: 0.01
momentum: 0.9
schedulers:
- SequentialLR:
schedulers:
- LinearLR:
initial_learning_rate: 0.01
end_learning_rate: 0.001
total_iters: 500
- CosineDecayLR:
initial_learning_rate: 0.001
end_learning_rate: 0.0001
total_iters: 500
...
...
import cerebras.pytorch as cstorch
from cerebras.modelzoo import Trainer
trainer = Trainer(
...,
optimizer=lambda model: cstorch.optim.SGD(
model.parameters(),
lr=0.01,
momentum=0.9,
),
schedulers=[
lambda optimizer: cstorch.optim.lr_scheduler.SequentialLR(
optimizer,
schedulers=[
cstorch.optim.lr_scheduler.LinearLR(
optimizer,
initial_learning_rate=0.01,
end_learning_rate=0.001,
total_iters=500,
),
cstorch.optim.lr_scheduler.CosineDecayLR(
optimizer,
initial_learning_rate=0.001,
end_learning_rate=0.0001,
total_iters=500,
),
]
),
...
],
...,
)
...
Note
Note how in python, optimizer is passed as a callable, assumed to be a
function that takes in a torch.nn.Module and returns a Optimizer.
It can also be passed as an Optimizer
provided the model is already defined.
Similarly schedulers is passed as a list of callables, where each element
is assumed to be a function that takes in a Optimizer
and returns a Scheduler. It can
also be passed as an Scheduler
provided the Optimizer is already defined.
Using callables allows us to pass in objects without having to predefine inputs to that object.
Conclusion#
That concludes this overview of using the Optimizer and the Scheduler
in conjunction with the Trainer. By this point, you should
have a cursory understanding of how to construct and configure a
Optimizer and Scheduler inside a Trainer instance.
What’s next?#
To learn more about how to configure checkpointing behaviour using the
Trainer,
see Model Zoo Trainer - Checkpoint.
Further Reading#
To learn about how you can configure a Trainer
instance using a YAML configuration file, you can check out:
Trainer YAML Overview
To learn more about how you can use the Trainer
in some core workflows, you can check out:
To learn more about how you can extend the capabilities of the
Trainer class, you can check out: