cerebras.modelzoo.data_preparation.nlp.chunk_data_processing#
This module implements a generic data preprocessor called ChunkDataPreprocessor. |
|
Script to generate an HDF5 dataset for GPT Models. |
|
This module contains helper functions and classes to read data from different formats, process them, and save in HDF5 format. |
|
FIMTokenGenerator Module |
|
LMDataTokenGenerator Module |
|
This module provides the VSLLMDataTokenGenerator class, extending LMDataTokenGenerator for advanced processing of tokenized text data tailored for variable-length sequence language modeling (VSLLM). |
|
MLMTokenGenerator Module |
|
SummarizationTokenGenerator Module |
|
This module provides the VSLSummarizationTokenGenerator class, which extends the SummarizationTokenGenerator for processing tokenized text data specifically for variable-length sequence summarization (VSLS). |
|