cerebras.modelzoo.data.common.input_utils.get_data_for_task#
- cerebras.modelzoo.data.common.input_utils.get_data_for_task(task_id, meta_data_values_cum_sum, num_examples_per_task, meta_data_values, meta_data_filenames)[source]#
- Function to get distribute files with given number of examples such that each distributed task has access to exactly the same number of examples - Parameters
- task_id (int) – Integer id for a task. 
- meta_data_values_cum_sum (int) – Cumulative sum of the file sizes in lines from meta data file. 
- num_examples_per_task (int) – Number of the examples specified per slurm task. Equal to batch_size * num_batch_per_task. 
- meta_data_values (list[int]) – List of the files sizes in lines in the meta data file. 
- meta_data_filenames (list[str]) – List with file names in the meta data file. 
 
- Returns
- list of tuples of length 3. The tuple contains at - index 0: filepath. - index 1: number of examples to be considered for this task_id. - index 2: start index in the file from where these - examples should be considered - The list represents the files that should be considered for this task_id.