cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor#
- class cerebras.modelzoo.data.common.SyntheticDataProcessor.SyntheticDataProcessor[source]#
Bases:
object
Creates a synthetic dataset.
Constructs a SyntheticDataset from the user-provided nested structure of input tensors and returns a torch.utils.data.DataLoader from the SyntheticDataset and the regular torch.utils.data.DataLoader inputs specified in params.yaml. The torch.utils.data.DataLoader is returned by calling the create_dataloader() method.
- Parameters
params – Dictionary containing dataset inputs and specifications. Within this dictionary, the user provides the additional ‘synthetic_inputs’ field that corresponds to a nested tree structure of input tensor specifications used to construct the SyntheticDataset.
params.yaml (In) –
- data_processor: “SyntheticDataProcessor”. Must set this input to
use this class
batch_size: int shuffle_seed: Optional[int] = None. If it is not None, then
torch.manual_seed(seed=shuffle_seed) will be called when creating the dataloader.
- num_examples: Optional[int] = None. If it is not None, then
the it specifies the number of examples/samples in the SyntheticDataset. Otherwise, the SyntheticDataset will generate samples indefinitely.
… synthetic_inputs:
Methods
Returns torch.utils.data.DataLoader that corresponds to the created SyntheticDataset.