cerebras.modelzoo.data.nlp.gpt.GptHDF5DataProcessor.GptHDF5DataProcessor#

class cerebras.modelzoo.data.nlp.gpt.GptHDF5DataProcessor.GptHDF5DataProcessor(config)[source]#

Bases: cerebras.modelzoo.data.common.HDF5IterableDataProcessor.HDF5IterableDataProcessor

A HDF5 dataset processor for GPT pre-training. Loads data from HDF5 files.

Parameters

config (cerebras.modelzoo.data.nlp.gpt.config.GptHDF5DataProcessorConfig) – The configuration object for the GPT HDF5 data processor.

Methods

collate_fn

create_dataloader

Classmethod to create the dataloader object.

create_dataloader()#

Classmethod to create the dataloader object.