Skip to content

Error with WordLevel tokenizer #9

@markjr-cisco

Description

@markjr-cisco

Tried examples/example.py with a tokenizer derived from a dict[str, int]:

from tokenizers import Tokenizer
from tokenizers.models import WordLevel
tokenizer = WordLevel(Tokenizer(str_to_int_dict))
tokenizer.eos_token_id = '\n'
<remaining example.py code>

Stack trace:

Traceback (most recent call last):
  File "<redacted>", line 177, in <module>
    '',
  File "/usr/local/lib/python3.10/dist-packages/parserllm/parserllm.py", line 43, in complete_cf
    terminal_regexes = extract_terminal_regex(parser, tokenizer.decode(tokenizer.eos_token_id))
  File "/usr/local/lib/python3.10/dist-packages/parserllm/parserllm.py", line 14, in extract_terminal_regex
    regex_map['$END'] = regex.compile(stop_token)
  File "<redacted>/.local/lib/python3.10/site-packages/regex/regex.py", line 353, in compile
    return _compile(pattern, flags, ignore_unused, kwargs, cache_pattern)
  File "/<redacted>/.local/lib/python3.10/site-packages/regex/regex.py", line 519, in _compile
    raise TypeError("first argument must be a string or compiled pattern")
TypeError: first argument must be a string or compiled pattern

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions