cerebras.modelzoo.common.utils.input.utils.SamplesSaver#
- class cerebras.modelzoo.common.utils.input.utils.SamplesSaver[source]#
- Bases: - object- Manages data samples chunking and saving for numpy arrays. - Parameters
- data_dir – Path to mounted dir where the samples are dumped 
- max_file_size – Maximum file size (in bytes) for the .npy samples file(s) 
- filename_prefix – (Optional) filename prefix for the .npy file(s) 
 
 - Methods - Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold. - Cleans up by deleting all dumped data. - Dumps any remaining data samples not yet written to file. - Attributes - Returns the total numer of data samples. - Returns the list of .npy file(s). - __init__(data_dir: str, max_file_size: int, filename_prefix: Optional[str] = None)[source]#
- Parameters
- data_dir – Path to mounted dir where the samples are dumped 
- max_file_size – Maximum file size (in bytes) for the .npy samples file(s) 
- filename_prefix – (Optional) filename prefix for the .npy file(s) 
 
 
 - property dataset_size: int#
- Returns the total numer of data samples. 
 - property samples_files: List[str]#
- Returns the list of .npy file(s).