cerebras.modelzoo.common.utils.input.utils.SamplesSaver#

class cerebras.modelzoo.common.utils.input.utils.SamplesSaver(data_dir, max_file_size, filename_prefix=None, dtype=None)[source]#

Bases: object

Manages data samples chunking and saving for numpy arrays.

Constructs a SamplesSaver instance.

Parameters
  • data_dir (str) – Path to mounted dir where the samples are dumped

  • max_file_size (int) – Maximum file size (in bytes) for the .npy samples file(s)

  • filename_prefix (Optional[str]) – (Optional) filename prefix for the .npy file(s)

  • dtype (Optional[numpy.dtype]) – (Optional) numpy dtype for the array. If unspecified, the dtype is np.int32

Methods

add_sample

Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.

delete_data_dumps

Cleans up by deleting all dumped data.

flush

Dumps any remaining data samples not yet written to file.

Attributes

dataset_size

Returns the total number of data samples.

samples_files

Returns the list of .npy file(s).

property dataset_size: int#

Returns the total number of data samples.

property samples_files: List[Tuple[str, int]]#

Returns the list of .npy file(s).

add_sample(data_sample)[source]#

Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.

Parameters

data_sample (numpy.array) – np array data sample

flush()[source]#

Dumps any remaining data samples not yet written to file.

delete_data_dumps()[source]#

Cleans up by deleting all dumped data.