cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.DatasetStats#
- class cerebras.modelzoo.data_preparation.nlp.hdf5_preprocessing.utils.DatasetStats(num_sequences: int, num_tokens: int, detokenized_bytes: int, detokenized_chars: int, non_pad_tokens: int, loss_valid_tokens: int)[source]#
Bases:
objectMethods
Attributes
num_sequencesnum_tokensdetokenized_bytesdetokenized_charsnon_pad_tokensloss_valid_tokens