Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammars bracket repetition symbol not working #1547

Closed
Viagounet opened this issue Jun 22, 2024 · 7 comments · Fixed by #1637
Closed

Grammars bracket repetition symbol not working #1547

Viagounet opened this issue Jun 22, 2024 · 7 comments · Fixed by #1637

Comments

@Viagounet
Copy link

Viagounet commented Jun 22, 2024

Hello, I tried checking for similar issues about this problem but couldn't find one.
I've had an issue with not being able to use the repetition brackets symbol when working with grammars.

I'm using Ubuntu 20.04, Python 3.12 and llama_cpp_python==0.2.79.

The following works fine:

from llama_cpp import LlamaGrammar

grammar_string = r"""root ::= "repeating" [a-z]+"""
my_grammar = LlamaGrammar.from_string(grammar_string, verbose=True)

But this doesn't:

from llama_cpp import LlamaGrammar

grammar_string = r"""root ::= "repeating" [a-z]{1,}"""
my_grammar = LlamaGrammar.from_string(grammar_string, verbose=True)

It returns this error:

parse: error parsing grammar: expecting newline or end at {1,}
Traceback (most recent call last):
    my_grammar = LlamaGrammar.from_string(grammar_string, verbose=True)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/anaconda3/envs/DocLLM/lib/python3.12/site-packages/llama_cpp/llama_grammar.py", line 71, in from_string
    raise ValueError(
ValueError: from_string: error parsing grammar file: parsed_grammar.rules is empty

The llama.cpp GBNF Guide seems to say it should be possible to use this pattern?

Repetition and Optional Symbols

* after a symbol or sequence means that it can be repeated zero or more times (equivalent to {0,}).
+ denotes that the symbol or sequence should appear one or more times (equivalent to {1,}).
? makes the preceding symbol or sequence optional (equivalent to {0,1}).
{m} repeats the precedent symbol or sequence exactly m times
{m,} repeats the precedent symbol or sequence at least m times
{m,n} repeats the precedent symbol or sequence at between m and n times (included)
{0,n} repeats the precedent symbol or sequence at most n times (included)

Not sure if my understanding of GBNF is lacking or if it's a real bug.
Thank you!

@yamikumo-DSD
Copy link

Same here.
Even an example that llama.cpp gives doesn't work with it, so I don't think its your fault.
llama.cpp's json GBNF example

@C0deMunk33
Copy link

C0deMunk33 commented Jun 29, 2024

running into this as well on a slightly more complicated gbnf on a line value ::= object_type_1 | object_type_2

"parse: error parsing grammar: expecting newline or end at object_type_1 | object_type_2"

@yamikumo-DSD
Copy link

This is a snippet of test GBNF that llama_cpp is offering in llama_grammar.py.

string ::=
  "\"" (
    [^"\\\x7F\x00-\x1F] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws

In this code, repetition is done by actually repeating (writing down) token as many as we want, which differs from what original JSON GBNF sample does.
So, my current understanding is that disabled {m} is intended feature and not a bug of llama-cpp-python's GBNF unlike the original GBNF.
I'm still not sure the reason why llama-cpp-python choose this behavior tho.

@Viagounet
Copy link
Author

Viagounet commented Jun 30, 2024

RIght, thanks for your answers. I ended up writing a function to automatically convert the bracket syntax into a set of repeating tokens. Not very elegant, but works well enough.

@HanClinto
Copy link

Bracket support for grammars was added about 3 weeks ago in ggml-org/llama.cpp#6640 -- is this Python library referencing a version that includes this newest change?

@ExtReMLapin
Copy link
Contributor

For some reasons it was reimplemented in python (god knows why) that's why it wasn't auto ported here

@ExtReMLapin
Copy link
Contributor

ExtReMLapin commented Jul 29, 2024

WIP #1637

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants