cerebras.modelzoo.data_preparation.nlp.pubmed.TextSharding.Sharding#
- class cerebras.modelzoo.data_preparation.nlp.pubmed.TextSharding.Sharding(input_files, output_name_prefix, n_training_shards, n_test_shards, fraction_test_set)[source]#
Bases:
objectMethods
distribute_articles_over_shardsget_sentences_per_shardinit_output_filesload_articlessegment_articles_into_sentenceswrite_shards_to_diskwrite_single_shard