cerebras.modelzoo.common.utils.input.utils.SamplesSaver#
- class cerebras.modelzoo.common.utils.input.utils.SamplesSaver[source]#
Bases:
object
Manages data samples chunking and saving for numpy arrays.
- Parameters
data_dir – Path to mounted dir where the samples are dumped
max_file_size – Maximum file size (in bytes) for the .npy samples file(s)
filename_prefix – (Optional) filename prefix for the .npy file(s)
Methods
Adds the np array to internally maintained list of data samples and dumps these to file if the total size exceeds max_file_size threshold.
Cleans up by deleting all dumped data.
Dumps any remaining data samples not yet written to file.
Attributes
Returns the total numer of data samples.
Returns the list of .npy file(s).
- __init__(data_dir: str, max_file_size: int, filename_prefix: Optional[str] = None)[source]#
- Parameters
data_dir – Path to mounted dir where the samples are dumped
max_file_size – Maximum file size (in bytes) for the .npy samples file(s)
filename_prefix – (Optional) filename prefix for the .npy file(s)
- property dataset_size: int#
Returns the total numer of data samples.
- property samples_files: List[str]#
Returns the list of .npy file(s).