cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig#
- class cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.config.base_config.BaseConfigMethods
check_for_deprecated_fieldscopyget_orig_classget_orig_class_argsmodel_copymodel_post_initpost_initAttributes
Batch size.
Path to dataset HDF5 files
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
List of features to include in the batch
model_configHow many subprocesses to use for data loading.
Flag to enable data shuffling.
Shuffle seed.
Flag to enable variable sequence length training.
- data_dir = Ellipsis#
Path to dataset HDF5 files
- batch_size = Ellipsis#
Batch size.
- shuffle = False#
Flag to enable data shuffling.
- shuffle_seed = None#
Shuffle seed.
- num_workers = 0#
How many subprocesses to use for data loading.
- drop_last = True#
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
- use_vsl = False#
Flag to enable variable sequence length training. It requires the dataset to have two extra features: the attention_span of keys and the position_ids of tokens.
- features_list = ['input_ids', 'attention_mask', 'labels']#
List of features to include in the batch
- __call__(**kwargs)#
Construct the original class with the current config.
By original class, we mean the class that this config class is associated with.