Introduction
Get Started
Prepare your data
Cerebras PyTorch API
cerebras.pytorch
cerebras.pytorch.amp
cerebras.pytorch.optim
cerebras.pytorch.sparse
cerebras.pytorch.metrics
Cerebras Model Zoo
Cerebras Guides
Fundamentals
Support
deduplicate_dataset
generate_connected_components
generate_duplicate_pairs
This script is used for duplicate pairs generation.
to_hash
previous
cerebras.modelzoo.data_preparation.nlp.chunk_data_processing.utils.save_mlm_data_to_csv
next
cerebras.modelzoo.data_preparation.nlp.data_dedup.deduplicate_dataset