cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig#
- class cerebras.modelzoo.data.common.HDF5IterableDataset.HDF5IterableDatasetConfig(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.config.base_config.BaseConfig
Methods
check_for_deprecated_fields
copy
get_orig_class
get_orig_class_args
model_copy
model_post_init
post_init
Attributes
Batch size.
Path to dataset HDF5 files
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
List of features to include in the batch
model_config
How many subprocesses to use for data loading.
Flag to enable data shuffling.
Shuffle seed.
Flag to enable variable sequence length training.
- data_dir = Ellipsis#
Path to dataset HDF5 files
- batch_size = Ellipsis#
Batch size.
- shuffle = False#
Flag to enable data shuffling.
- shuffle_seed = None#
Shuffle seed.
- num_workers = 0#
How many subprocesses to use for data loading.
- drop_last = True#
If True and the dataset size is not divisible by the batch size, the last incomplete batch will be dropped.
- use_vsl = False#
Flag to enable variable sequence length training. It requires the dataset to have two extra features: the attention_span of keys and the position_ids of tokens.
- features_list = ['input_ids', 'attention_mask', 'labels']#
List of features to include in the batch
- __call__(**kwargs)#
Construct the original class with the current config.
By original class, we mean the class that this config class is associated with.