Cerebras Model Zoo Extensions#
This module contains integrations of external tools to the Trainer.
Downstream Validation Callbacks#
The set of callbacks that implement eval harness, i.e. external frameworks for running
downstream validation with the Trainer
.
BigCodeEvalHarness
- class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness(bigcode_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback
ValidationCallback class to run BigCode’s Evaluation Harness.
- Parameters
bigcode_args (Union[cerebras.modelzoo.trainer.extensions.bigcode.bigcode_eval_harness.BigCodeCLIArgs, Dict[str, Any]]) – BigCodeCLIArgs dataclass or dict capturing BCEH’s CLI args
keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.
every_n_vals (int) –
Run the BigCode eval harness script every N validations. e.g. If the eval_frequency is set to 200 and N=2,
then BigCode eval harness runs every 400 training steps.
The BigCode eval harness script will also always run after the final training iteration.
flags (Optional[dict]) – A optional dictionary of scoped global flags to set during the BigCode eval harness run.
name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.
batch_size (Optional[int]) – Batch size to BigCodeEvalHarness to preprocess input data samples from the specified eval harness tasks.
data_dir (Optional[str]) – Path to data directory
max_sequence_length (Optional[int]) – Maximum sequence length
tokenizer_file_path (Optional[str]) – Path to tokenizer file
eos_id (Optional[int]) – End of sentence token id
dataloader_args – Any additional dataloader args, e.g. num_workers.
EleutherEvalHarness
- class cerebras.modelzoo.trainer.extensions.eleuther.EleutherEvalHarness(eeh_args, keep_data_dir=False, every_n_vals=1, flags=None, name_scope=None, batch_size=None, data_dir=None, max_sequence_length=None, tokenizer_file_path=None, eos_id=None, **dataloader_args)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.ValidationCallback
Callback class to run EleutherAI’s Evaluation Harness.
- Parameters
eeh_args (Union[cerebras.modelzoo.trainer.extensions.eleuther.eval_harness_utils.EleutherCLIArgs, Dict[str, Any]]) – EleutherCLIArgs dataclass or dict capturing EEH’s CLI args
keep_data_dir (bool) – Specifies whether dumped data samples should be kept for reuse. Defaults to False, i.e. data samples are deleted after the run.
every_n_vals (int) –
Run the EEH script every N validations e.g. If the eval_frequency is set to 200 and N=2,
then EEH runs every 400 training steps.
The EEH script will also always run after the final training iteration.
flags (Optional[dict]) – An optional dictionary of scoped global flags to set during the EEH run.
name_scope (Optional[str]) – An optional string that gets added to the trainer’s name scope.
batch_size (Optional[int]) – Batch size to EleutherEvalHarness to preprocess input data samples from the specified eval harness tasks.
data_dir (Optional[str]) – Path to data directory
max_sequence_length (Optional[int]) – Maximum sequence length
tokenizer_file_path (Optional[str]) – Path to tokenizer file
eos_id (Optional[int]) – End of sentence token id
dataloader_args – Any additional dataloader args, e.g. num_workers.
- property has_generative_task#
Returns True if the task dictionary contains a generative task.
- property has_non_generative_task#
Returns True if the task dictionary contains a non-generative task.
Eval Harness Utils#
Util classes capturing the command line interface arguments for the supported eval harness frameworks.
BigCodeCLIArgs
- class cerebras.modelzoo.trainer.extensions.bigcode.BigCodeCLIArgs(prefix='', do_sample=True, temperature=None, top_k=None, top_p=None, n_samples=1, seed=0, tasks=None, instruction_tokens=None, max_length_generation=512, limit=None, limit_start=0, save_every_k_tasks=- 1, postprocess=True, allow_code_execution=False, generation_only=True, load_generations_path=None, load_data_path=None, metric_output_path='evaluation_results.json', save_generations=True, load_generations_intermediate_paths=None, save_generations_path='generations.json', save_references=True, save_references_path='references.json', prompt='prompt', check_references=False)[source]#
Captures BigCode EH’s CLI arguments with defaults.
- Fields:
prefix: Prefix to add to the prompt. For example InCoder needs prefix=’<| file ext=.py |>n’ do_sample: Sample from the language model’s output distribution. temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling. n_samples: Number of completions to generate for each sample. seed: Random seed used for evaluation. tasks: List of tasks to evaluate code evals instruction_tokens: A series of instruction tokens used for instruction-tuning
benchamrks separated by comma e.g. <user_message>,<end_user_message>,<assistant_message>
max_length_generation: Maximum length of generated sequence (prompt+generation). limit: Number of samples to solve and evaluate from the benchmark limit_start: Optional offset to start from when limiting the number of samples save_every_k_tasks: Optional saving after every k tasks postprocess: Postprocess model outputs before execution, always on except
during generation tests
- allow_code_execution: Allow code evaluation to execute external/untrusted Python
code on your machine
generation_only: Do code generation but no evaluation load_generations_path: Path of file with previously generated solutions, if
provided generation is skipped and only evaluation is done
load_data_path: Path of additional data to load for the tasks metric_output_path: Path to save the results save_generations: Whether to save code generations load_generations_intermediate_paths: List of paths for saving the
intermediate code generations
save_generations_path: Path for saving the code generations save_references: Whether to save reference solutions/tests save_references_path: Path for saving the references solutions/tests prompt: Prompt type to use for generation in HumanEvalPack tasks check_references: Don’t run generation but benchmark groundtruth (useful for debugging)
EleutherCLIArgs
- class cerebras.modelzoo.trainer.extensions.eleuther.EleutherCLIArgs(tasks, num_fewshot=None, output_path=None, limit=None, use_cache=None, cache_requests=None, check_integrity=False, write_out=False, log_samples=False, show_config=False, include_path=None, predict_only=False, seed='0,1234,1234', trust_remote_code=False, verbosity='INFO', max_length_generation=None, temperature=None, top_k=None, top_p=None)[source]#
Captures EEH’s CLI arguments with defaults.
- Fields:
- tasks: List of tasks to evaluate
To get full list of tasks, use the command
lm-eval --tasks list
num_fewshot: Number of examples in few-shot context output_path: The path to the output file where the result metrics
will be saved. If the path is a directory and log_samples is true, the results will be saved in the directory. Else the parent directory will be used.
- limit: Limit the number of examples per task.
If <1, limit is a percentage of the total number of examples.
- use_cache: A path to a sqlite db file for caching model responses.
None if not caching.
- cache_requests: Speed up evaluation by caching the building of
dataset requests. None if not caching.
- check_integrity: Whether to run the relevant part of the test suite
for the tasks.
write_out: Prints the prompt for the first few documents. log_samples: If True, write out all model outputs and documents for
per-sample measurement and post-hoc analysis. Use with –output_path.
- show_config: If True, shows the the full config of all tasks at the
end of the evaluation.
- include_path: Additional path to include if there are external tasks
to include.
- predict_only: Use with –log_samples. Only model outputs will be
saved and metrics will not be evaluated.
- seed: Set seed for python’s random, numpy and torch.
Accepts a comma-separated list of 3 values for python’s random, numpy, and torch seeds, respectively, or a single integer to set the same seed for all three. The values are either an integer or
None
to not set the seed. Default is0,1234,1234
(for backward compatibility). E.g.--seed 0,None,8
setsrandom.seed(0)
andtorch.manual_seed(8)
. Here numpy’s seed is not set since the second value isNone
. E.g,--seed 42
sets all three seeds to 42.- trust_remote_code: Sets trust_remote_code to True to execute code to
create HF Datasets from the Hub
verbosity: EEH logging level max_length_generation: Maximum length of generated sequence (prompt+generation). temperature: Sampling temperature used for generation. top_k: Top-k parameter used for generation. top_p: Top-p parameter used for nucleus sampling.
Other Extensions#
Other extensions implemented as callbacks that can be used to enhance the
Trainer
.
HFCacheDir
#
- class cerebras.modelzoo.trainer.extensions.HFCacheDir(cache_dir)[source]#
Bases:
cerebras.modelzoo.trainer.callbacks.callback.Callback
A callback that sets up the HuggingFace cache directory.
- Parameters
cache_dir (str) – The cache directory to use for HuggingFace utilities.
WandbLogger
#
- class cerebras.modelzoo.trainer.extensions.WandbLogger(project=None, group=None, run_id=None, run_name=None, job_type=None, tags=None, resume='auto')[source]#
Bases:
cerebras.modelzoo.trainer.loggers.logger.Logger
Logger class for logging metrics to Weights and Biases.
- Parameters
project (Optional[str]) – The name of the project to which the run belongs.
group (Optional[str]) – The name of the group to which the run belongs.
run_id (Optional[str]) – The unique identifier for the run.
run_name (Optional[str]) – The name of the run.
job_type (Optional[str]) – The type of job.
tags (Optional[List[str]]) – List of tags to be associated with the run.
resume (str) – Resume mode for the run. It can be one of the following: - “never”: Do not resume the run. - “allow”: Allow the run to resume if a previous run exists. - “auto”: Automatically resume the run if a previous run exists. - “must”: Resume the run if a previous run exists.