Model is too large to fit in memory#
Observed Error#
Model is too large to fit in memory. This can happen because of a large batch size, large input tensor dimensions, or other network parameters. Please refer to the Troubleshooting section in the documentation for potential workarounds
Causes and Possible Solutions#
The memory requirements of your model are too large to fit on the device. Potential workarounds include:
On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the
micro_batch_size
parameter in thetrain_input
oreval_input
section of your model’s yaml file (see Optimizing performance with automatic microbatching). * If you ran with batch tiling with a specificmicro_batch_size
value, you can try compiling with a decreasedmicro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory.On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.
Note
For more information on working with batch tiling and selecting performant micro_batch_size
values, visit Optimizing performance with automatic microbatching
Note
The batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.