cerebras.modelzoo.data_preparation.raw_dataset_processor.RawDatasetProcessor.MultimodalRawDatasetProcessor#
- class cerebras.modelzoo.data_preparation.raw_dataset_processor.RawDatasetProcessor.MultimodalRawDatasetProcessor(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data_preparation.raw_dataset_processor.RawDatasetProcessor.RawDatasetProcessor
Dataset processor for multimodal data (e.g., image data).
Methods
Collates a list of dictionaries into a batch
Classmethod to create the dataloader object.
Returns the next item in the iteration.
preprocess_img
- collate_fn(batch)#
Collates a list of dictionaries into a batch
- Parameters
batch (List[Dict[str, np.ndarray]]) – A list of dictionaries, where each dictionary contains string keys and NumPy array values.
- Returns
The collated batch.
- Return type
Any
- create_dataloader()#
Classmethod to create the dataloader object.
- Returns
A DataLoader object for the dataset.
- Return type
- get_next_item()[source]#
Returns the next item in the iteration.
This function iterates over the data stream from the reader, tokenizes the data, and yields dictionaries containing features as keys and NumPy arrays as values.
- Returns
An iterator yielding dictionaries with string keys and NumPy array values.
- Return type
Iterator[Dict[str, np.ndarray]]