cerebras.modelzoo.common.utils.input.utils.SamplesSaver#
- class cerebras.modelzoo.common.utils.input.utils.SamplesSaver(data_dir, max_file_size, filename_prefix=None, dtype=None)[source]#
Bases:
object
Manages data samples chunking and saving for numpy arrays.
Constructs a SamplesSaver instance.
- Parameters
data_dir (str) – Path to mounted dir where the samples are dumped
max_file_size (int) – Maximum file size (in bytes) for the .npy samples file(s)
filename_prefix (Optional[str]) – (Optional) filename prefix for the .npy file(s)
dtype (Optional[numpy.dtype]) – (Optional) numpy dtype for the array. If unspecified, the dtype is np.int32
Methods
Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.
Cleans up by deleting all dumped data.
Dumps any remaining data samples not yet written to file.
Attributes
Returns the total number of data samples.
Returns the list of .npy file(s).
- property dataset_size: int#
Returns the total number of data samples.
- property samples_files: List[Tuple[str, int]]#
Returns the list of .npy file(s).