cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids#
- cerebras.modelzoo.data.nlp.gpt.InferenceDataProcessor.get_token_ids(text: str, tokenizer: Union[tokenizers.Tokenizer, transformers.PreTrainedTokenizerBase]) List[int] [source]#
Get encoded token ids from a string using the specified tokenizer.
- Parameters
text (str) – The input string.
tokenizer (Tokenizer) – Tokenizer class from huggingface tokenizers library.
- Returns
List of token ids.
- Return type
List[int]