Cerebras Model Zoo Extensions#

This module contains integrations of external tools to the Trainer.

Downstream Validation Callbacks#

The set of callbacks that implement eval harness, i.e. external frameworks for running downstream validation with the Trainer.

BigCodeEvalHarness

class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness(bigcode_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback

ValidationCallback class to run BigCode’s Evaluation Harness.

Parameters
  • bigcode_args (Union[cerebras.modelzoo.trainer.extensions.bigcode.bigcode_eval_harness.BigCodeCLIArgs, Dict[str, Any]]) – BigCodeCLIArgs dataclass or dict capturing BCEH’s CLI args

  • keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.

  • every_n_vals (int) –

    Run the BigCode eval harness script every N validations. e.g. If the eval_frequency is set to 200 and N=2,

    then BigCode eval harness runs every 400 training steps.

    The BigCode eval harness script will also always run after the final training iteration.

  • flags (Optional[dict]) – A optional dictionary of scoped global flags to set during the BigCode eval harness run.

  • name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.

  • batch_size (Optional[int]) – Batch size to BigCodeEvalHarness to preprocess input data samples from the specified eval harness tasks.

  • data_dir (Optional[str]) – Path to data directory

  • max_sequence_length (Optional[int]) – Maximum sequence length

  • tokenizer_file_path (Optional[str]) – Path to tokenizer file

  • eos_id (Optional[int]) – End of sentence token id

  • dataloader_args – Any additional dataloader args, e.g. num_workers.

run(trainer)[source]#

Run BigCode Eval Harness.

Parameters

trainer – the Trainer object

EleutherEvalHarness

class cerebras.modelzoo.trainer.extensions.eleuther.EleutherEvalHarness(eeh_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback

Callback class to run EleutherAI’s Evaluation Harness.

Parameters
  • eeh_args (Union[cerebras.modelzoo.trainer.extensions.eleuther.eval_harness_utils.EleutherCLIArgs, Dict[str, Any]]) – EleutherCLIArgs dataclass or dict capturing EEH’s CLI args

  • keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.

  • every_n_vals (int) –

    Run the EEH script every N validations e.g. If the eval_frequency is set to 200 and N=2,

    then EEH runs every 400 training steps.

    The EEH script will also always run after the final training iteration.

  • flags (Optional[dict]) – An optional dictionary of scoped global flags to set during the EEH run.

  • name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.

  • batch_size (Optional[int]) – Batch size to EleutherEvalHarness to preprocess input data samples from the specified eval harness tasks.

  • data_dir (Optional[str]) – Path to data directory

  • max_sequence_length (Optional[int]) – Maximum sequence length

  • tokenizer_file_path (Optional[str]) – Path to tokenizer file

  • eos_id (Optional[int]) – End of sentence token id

  • dataloader_args – Any additional dataloader args, e.g. num_workers.

property has_generative_task#

Returns True if the task dictionary contains a generative task.

property has_non_generative_task#

Returns True if the task dictionary contains a non-generative task.

run(trainer)[source]#

Run the EleutherAI Evaluation Harness.

Parameters

trainer – the Trainer object

Eval Harness Utils#

Util classes capturing the command line interface arguments for the supported eval harness frameworks.

BigCodeCLIArgs

class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeCLIArgs(prefix='', do_sample=True, temperature=None, top_k=None, top_p=None, n_samples=1, seed=0, tasks=None, instruction_tokens=None, max_length_generation=512, limit=None, limit_start=0, save_every_k_tasks=- 1, postprocess=True, allow_code_execution=False, generation_only=True, load_generations_path=None, load_data_path=None, metric_output_path='evaluation_results.json', save_generations=True, load_generations_intermediate_paths=None, save_generations_path='generations.json', save_references=True, save_references_path='references.json', prompt='prompt', check_references=False)[source]#

Captures BigCode EH’s CLI arguments with defaults.

Fields:

prefix: Prefix to add to the prompt. For example InCoder needs prefix=’<| file ext=.py |>n’ do_sample: Sample from the language model’s output distribution. temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling. n_samples: Number of completions to generate for each sample. seed: Random seed used for evaluation. tasks: List of tasks to evaluate code evals instruction_tokens: A series of instruction tokens used for instruction-tuning

benchamrks separated by comma e.g. <user_message>,<end_user_message>,<assistant_message>

max_length_generation: Maximum length of generated sequence (prompt+generation). limit: Number of samples to solve and evaluate from the benchmark limit_start: Optional offset to start from when limiting the number of samples save_every_k_tasks: Optional saving after every k tasks postprocess: Postprocess model outputs before execution, always on except

during generation tests

allow_code_execution: Allow code evaluation to execute external/untrusted Python

code on your machine

generation_only: Do code generation but no evaluation load_generations_path: Path of file with previously generated solutions, if

provided generation is skipped and only evaluation is done

load_data_path: Path of additional data to load for the tasks metric_output_path: Path to save the results save_generations: Whether to save code generations load_generations_intermediate_paths: List of paths for saving the

intermediate code generations

save_generations_path: Path for saving the code generations save_references: Whether to save reference solutions/tests save_references_path: Path for saving the references solutions/tests prompt: Prompt type to use for generation in HumanEvalPack tasks check_references: Don’t run generation but benchmark groundtruth (useful for debugging)

EleutherCLIArgs

class cerebras.modelzoo.trainer.extensions.eleuther.EleutherCLIArgs(tasks, num_fewshot=None, output_path=None, limit=None, use_cache=None, cache_requests=None, check_integrity=False, write_out=False, log_samples=False, show_config=False, include_path=None, predict_only=False, seed='0,1234,1234', trust_remote_code=False, verbosity='INFO', max_length_generation=None, temperature=None, top_k=None, top_p=None)[source]#

Captures EEH’s CLI arguments with defaults.

Fields:
tasks: List of tasks to evaluate

To get full list of tasks, use the command lm-eval --tasks list

num_fewshot: Number of examples in few-shot context output_path: The path to the output file where the result metrics

will be saved. If the path is a directory and log_samples is true, the results will be saved in the directory. Else the parent directory will be used.

limit: Limit the number of examples per task.

If <1, limit is a percentage of the total number of examples.

use_cache: A path to a sqlite db file for caching model responses.

None if not caching.

cache_requests: Speed up evaluation by caching the building of

dataset requests. None if not caching.

check_integrity: Whether to run the relevant part of the test suite

for the tasks.

write_out: Prints the prompt for the first few documents. log_samples: If True, write out all model outputs and documents for

per-sample measurement and post-hoc analysis. Use with –output_path.

show_config: If True, shows the the full config of all tasks at the

end of the evaluation.

include_path: Additional path to include if there are external tasks

to include.

predict_only: Use with –log_samples. Only model outputs will be

saved and metrics will not be evaluated.

seed: Set seed for python’s random, numpy and torch.

Accepts a comma-separated list of 3 values for python’s random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or None to not set the seed. Default is 0,1234,1234 (for backward compatibility). E.g. --seed 0,None,8 sets random.seed(0) and torch.manual_seed(8). Here numpy’s seed is not set since the second value is None. E.g, --seed 42 sets all three seeds to 42.

trust_remote_code: Sets trust_remote_code to True to execute code to

create HF Datasets from the Hub

verbosity: EEH logging level max_length_generation: Maximum length of generated sequence (prompt+generation). temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling.

Other Extensions#

Other extensions implemented as callbacks that can be used to enhance the Trainer.

HFCacheDir#

class cerebras.modelzoo.trainer.extensions.HFCacheDir(cache_dir)[source]#

Bases: cerebras.modelzoo.trainer.callbacks.callback.Callback

A callback that sets up the HuggingFace cache directory.

Parameters

cache_dir (str) – The cache directory to use for HuggingFace utilities.

WandbLogger#

class cerebras.modelzoo.trainer.extensions.WandbLogger(project=None, group=None, run_id=None, run_name=None, job_type=None, tags=None, resume='auto')[source]#

Bases: cerebras.modelzoo.trainer.loggers.logger.Logger

Logger class for logging metrics to Weights and Biases.

Parameters
  • project (Optional[str]) – The name of the project to which the run belongs.

  • group (Optional[str]) – The name of the group to which the run belongs.

  • run_id (Optional[str]) – The unique identifier for the run.

  • run_name (Optional[str]) – The name of the run.

  • job_type (Optional[str]) – The type of job.

  • tags (Optional[List[str]]) – List of tags to be associated with the run.

  • resume (str) – Resume mode for the run. It can be one of the following: - “never”: Do not resume the run. - “allow”: Allow the run to resume if a previous run exists. - “auto”: Automatically resume the run if a previous run exists. - “must”: Resume the run if a previous run exists.

check_presence_of_wandb_dir(rundir)[source]#

Check if the wandb directory is present in the run directory.

Parameters

rundir – The directory where the run is being stored.