cerebras.modelzoo.data_preparation.data_preprocessing.fim_token_generator#
FIMTokenGenerator Module
This module offers the FIMTokenGenerator class, an extension of the PretrainingTokenGenerator class, tailored for fill in the middle (FIM) tasks.
- Usage:
- from your_module_name import FIMTokenGenerator - # Initialize the token generator with the required parameters tokenizer = FIMTokenGenerator(params, tokenizer_impl, eos_id, pad_id) - # Tokenize and encode text data tokenized_data, stats = tokenizer.encode(“Your sample text to process.”) 
Classes
| Initialize the FIMTokenGenerator class. |