cerebras.modelzoo.data.nlp.transformer.TransformerDynamicDataProcessor.TransformerDynamicDataProcessor#
- class cerebras.modelzoo.data.nlp.transformer.TransformerDynamicDataProcessor.TransformerDynamicDataProcessor(*args, **kwargs)[source]#
Bases:
cerebras.modelzoo.data.nlp.t5.T5DynamicDataProcessor.T5DynamicDataProcessor
Reads text files containing the input text tokens.
- Parameters
config (cerebras.modelzoo.data.nlp.transformer.TransformerDynamicDataProcessor.TransformerDynamicDataProcessorConfig) – The configuration object for the processor.
Methods
Classmethod to create the dataloader object.
Takes a single sample and returns the sequence length of that sample to be used for VTS bucketing.
Read data from meta files.
Iterating over the data to construct input features.
Generator to read the data in chunks of size of data_buffer.
- get_meta_data(data_dir)[source]#
Read data from meta files. :param str data_dir: Path to the input directory. :return: Processed meta data.
- load_buffer()[source]#
Generator to read the data in chunks of size of data_buffer. We read data from both source and target input datasets to prepare features for side by side translation task. :returns: Yields the data stored in the data_buffer.
- get_single_item()[source]#
Iterating over the data to construct input features.
- Returns
A dict with training features: * np.array[int.32] input_ids: Numpy array with encoder input token indices.
Shape: (src_max_sequence_length).
- np.array[int.32] decoder_input_ids: Numpy array with decoder input token indices.
Shape: (tgt_max_sequence_length).
- np.array[int.32] attention_mask: Numpy array with attention mask for encoder.
Shape: (src_max_sequence_length).
- np.array[int.32] decoder_attention_mask: Numpy array with attention mask for decoder.
Shape: (tgt_max_sequence_length).
- np.array[int.32] labels: Numpy array with labels for teacher forcing mode.
Shape: (tgt_max_sequence_length).
- element_length_fn(features)[source]#
Takes a single sample and returns the sequence length of that sample to be used for VTS bucketing.
- create_dataloader()#
Classmethod to create the dataloader object.