modelzoo.transformers.pytorch.t5.input.utils.concatenate_documents#
- modelzoo.transformers.pytorch.t5.input.utils.concatenate_documents(dataset, num_to_concatenate=128, pad_id=0)[source]#
 Concatenate unrelated documents together to reduce the need for padding.
- Parameters
 dataset (iterable) – The input dataset.
num_to_concatenate (int) – How many documents to concatanate together.
- Params int pad_id
 The vocab id reserved for padding values. Must not occur anywhere in the dataset.
- Yields
 new samples made from concatenating samples in dataset.