Skip to content

Commit 396b18d

Browse files
authoredJun 11, 2024··
json: document schema conversion in GBNF readme, align manual grammar examples & converters (#7841)
* json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme
1 parent 864a99e commit 396b18d

7 files changed

+67
-28
lines changed
 

‎common/json-schema-to-grammar.cpp

+1-1
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ std::unordered_map<std::string, BuiltinRule> PRIMITIVE_RULES = {
5757
{"object", {"\"{\" space ( string \":\" space value (\",\" space string \":\" space value)* )? \"}\" space", {"string", "value"}}},
5858
{"array", {"\"[\" space ( value (\",\" space value)* )? \"]\" space", {"value"}}},
5959
{"uuid", {"\"\\\"\" [0-9a-fA-F]{8} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{4} \"-\" [0-9a-fA-F]{12} \"\\\"\" space", {}}},
60-
{"char", {"[^\"\\\\] | \"\\\\\" ([\"\\\\/bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
60+
{"char", {"[^\"\\\\\\x7F\\x00-\\x1F] | [\\\\] ([\"\\\\bfnrt] | \"u\" [0-9a-fA-F]{4})", {}}},
6161
{"string", {"\"\\\"\" char* \"\\\"\" space", {"char"}}},
6262
{"null", {"\"null\" space", {}}},
6363
};

‎examples/json_schema_to_grammar.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def __init__(self, content: str, deps: list = None):
4343
'object' : BuiltinRule('"{" space ( string ":" space value ("," space string ":" space value)* )? "}" space', ['string', 'value']),
4444
'array' : BuiltinRule('"[" space ( value ("," space value)* )? "]" space', ['value']),
4545
'uuid' : BuiltinRule(r'"\"" [0-9a-fA-F]{8} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{12} "\"" space', []),
46-
'char' : BuiltinRule(r'[^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})', []),
46+
'char' : BuiltinRule(r'[^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})', []),
4747
'string' : BuiltinRule(r'"\"" char* "\"" space', ['char']),
4848
'null' : BuiltinRule('"null" space', []),
4949
}

‎examples/server/public/json-schema-to-grammar.mjs

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ const PRIMITIVE_RULES = {
4141
object : new BuiltinRule('"{" space ( string ":" space value ("," space string ":" space value)* )? "}" space', ['string', 'value']),
4242
array : new BuiltinRule('"[" space ( value ("," space value)* )? "]" space', ['value']),
4343
uuid : new BuiltinRule('"\\"" [0-9a-fA-F]{8} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{4} "-" [0-9a-fA-F]{12} "\\"" space', []),
44-
char : new BuiltinRule(`[^"\\\\] | "\\\\" (["\\\\/bfnrt] | "u" [0-9a-fA-F]{4})`, []),
44+
char : new BuiltinRule(`[^"\\\\\\x7F\\x00-\\x1F] | [\\\\] (["\\\\bfnrt] | "u" [0-9a-fA-F]{4})`, []),
4545
string : new BuiltinRule(`"\\"" char* "\\"" space`, ['char']),
4646
null : new BuiltinRule('"null" space', []),
4747
};

‎grammars/README.md

+39
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ This guide provides a brief overview. Check out the GBNF files in this directory
9494
./main -m <model> --grammar-file grammars/some-grammar.gbnf -p 'Some prompt'
9595
```
9696

97+
`llama.cpp` can also convert JSON schemas to grammars either ahead of time or at each request, see below.
98+
9799
## Troubleshooting
98100

99101
Grammars currently have performance gotchas (see https://github.com/ggerganov/llama.cpp/issues/4218).
@@ -103,3 +105,40 @@ Grammars currently have performance gotchas (see https://github.com/ggerganov/ll
103105
A common pattern is to allow repetitions of a pattern `x` up to N times.
104106

105107
While semantically correct, the syntax `x? x? x?.... x?` (with N repetitions) may result in extremely slow sampling. Instead, you can write `x{0,N}` (or `(x (x (x ... (x)?...)?)?)?` w/ N-deep nesting in earlier llama.cpp versions).
108+
109+
## Using GBNF grammars
110+
111+
You can use GBNF grammars:
112+
113+
- In the [server](../examples/server)'s completion endpoints, passed as the `grammar` body field
114+
- In the [main](../examples/main) CLI, passed as the `--grammar` & `--grammar-file` flags
115+
- With the [gbnf-validator](../examples/gbnf-validator) tool, to test them against strings.
116+
117+
## JSON Schemas → GBNF
118+
119+
`llama.cpp` supports converting a subset of https://json-schema.org/ to GBNF grammars:
120+
121+
- In the [server](../examples/server):
122+
- For any completion endpoints, passed as the `json_schema` body field
123+
- For the `/chat/completions` endpoint, passed inside the `result_format` body field (e.g. `{"type", "json_object", "schema": {"items": {}}}`)
124+
- In the [main](../examples/main) CLI, passed as the `--json` / `-j` flag
125+
- To convert to a grammar ahead of time:
126+
- in CLI, with [json_schema_to_grammar.py](../examples/json_schema_to_grammar.py)
127+
- in JavaScript with [json-schema-to-grammar.mjs](../examples/server/public/json-schema-to-grammar.mjs) (this is used by the [server](../examples/server)'s Web UI)
128+
129+
Take a look at [tests](../../tests/test-json-schema-to-grammar.cpp) to see which features are likely supported (you'll also find usage examples in https://github.com/ggerganov/llama.cpp/pull/5978, https://github.com/ggerganov/llama.cpp/pull/6659 & https://github.com/ggerganov/llama.cpp/pull/6555).
130+
131+
Here is also a non-exhaustive list of **unsupported** features:
132+
133+
- `additionalProperties`: to be fixed in https://github.com/ggerganov/llama.cpp/pull/7840
134+
- `minimum`, `exclusiveMinimum`, `maximum`, `exclusiveMaximum`
135+
- `integer` constraints to be implemented in https://github.com/ggerganov/llama.cpp/pull/7797
136+
- Remote `$ref`s in the C++ version (Python & JavaScript versions fetch https refs)
137+
- Mixing `properties` w/ `anyOf` / `oneOf` in the same type (https://github.com/ggerganov/llama.cpp/issues/7703)
138+
- `string` formats `uri`, `email`
139+
- [`contains`](https://json-schema.org/draft/2020-12/json-schema-core#name-contains) / `minContains`
140+
- `uniqueItems`
141+
- `$anchor` (cf. [dereferencing](https://json-schema.org/draft/2020-12/json-schema-core#name-dereferencing))
142+
- [`not`](https://json-schema.org/draft/2020-12/json-schema-core#name-not)
143+
- [Conditionals](https://json-schema.org/draft/2020-12/json-schema-core#name-keywords-for-applying-subsche) `if` / `then` / `else` / `dependentSchemas`
144+
- [`patternProperties`](https://json-schema.org/draft/2020-12/json-schema-core#name-patternproperties)

‎grammars/json.gbnf

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ array ::=
1616
string ::=
1717
"\"" (
1818
[^"\\\x7F\x00-\x1F] |
19-
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
19+
"\\" (["\\bfnrt] | "u" [0-9a-fA-F]{4}) # escapes
2020
)* "\"" ws
2121

22-
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
22+
number ::= ("-"? ([0-9] | [1-9] [0-9]{0,15})) ("." [0-9]+)? ([eE] [-+]? [0-9] [1-9]{0,15})? ws
2323

2424
# Optional space: by convention, applied in this grammar after literal chars when allowed
25-
ws ::= ([ \t\n] ws)?
25+
ws ::= [ \t\n]{0,20}

‎grammars/json_arr.gbnf

+3-3
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,10 @@ array ::=
2525
string ::=
2626
"\"" (
2727
[^"\\\x7F\x00-\x1F] |
28-
"\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
28+
"\\" (["\\bfnrt] | "u" [0-9a-fA-F]{4}) # escapes
2929
)* "\"" ws
3030

31-
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
31+
number ::= ("-"? ([0-9] | [1-9] [0-9]{0,15})) ("." [0-9]+)? ([eE] [-+]? [1-9] [0-9]{0,15})? ws
3232

3333
# Optional space: by convention, applied in this grammar after literal chars when allowed
34-
ws ::= ([ \t\n] ws)?
34+
ws ::= [ \t\n]{0,20}

‎tests/test-json-schema-to-grammar.cpp

+19-19
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
105105
R"""(
106106
array ::= "[" space ( value ("," space value)* )? "]" space
107107
boolean ::= ("true" | "false") space
108-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
108+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
109109
decimal-part ::= [0-9]{1,16}
110110
integral-part ::= [0] | [1-9] [0-9]{0,15}
111111
null ::= "null" space
@@ -152,7 +152,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
152152
"type": "string"
153153
})""",
154154
R"""(
155-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
155+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
156156
root ::= "\"" char* "\"" space
157157
space ::= " "?
158158
)"""
@@ -166,7 +166,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
166166
"minLength": 1
167167
})""",
168168
R"""(
169-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
169+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
170170
root ::= "\"" char+ "\"" space
171171
space ::= " "?
172172
)"""
@@ -180,7 +180,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
180180
"minLength": 3
181181
})""",
182182
R"""(
183-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
183+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
184184
root ::= "\"" char{3,} "\"" space
185185
space ::= " "?
186186
)"""
@@ -194,7 +194,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
194194
"maxLength": 3
195195
})""",
196196
R"""(
197-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
197+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
198198
root ::= "\"" char{0,3} "\"" space
199199
space ::= " "?
200200
)"""
@@ -209,7 +209,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
209209
"maxLength": 4
210210
})""",
211211
R"""(
212-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
212+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
213213
root ::= "\"" char{1,4} "\"" space
214214
space ::= " "?
215215
)"""
@@ -283,7 +283,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
283283
"prefixItems": [{ "type": "string" }]
284284
})""",
285285
R"""(
286-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
286+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
287287
root ::= "[" space string "]" space
288288
space ::= " "?
289289
string ::= "\"" char* "\"" space
@@ -297,7 +297,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
297297
"prefixItems": [{ "type": "string" }, { "type": "number" }]
298298
})""",
299299
R"""(
300-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
300+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
301301
decimal-part ::= [0-9]{1,16}
302302
integral-part ::= [0] | [1-9] [0-9]{0,15}
303303
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
@@ -466,7 +466,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
466466
a-kv ::= "\"a\"" space ":" space string
467467
b-kv ::= "\"b\"" space ":" space string
468468
c-kv ::= "\"c\"" space ":" space string
469-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
469+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
470470
root ::= "{" space b-kv "," space c-kv "," space a-kv "}" space
471471
space ::= " "?
472472
string ::= "\"" char* "\"" space
@@ -486,7 +486,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
486486
})""",
487487
R"""(
488488
a-kv ::= "\"a\"" space ":" space string
489-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
489+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
490490
root ::= "{" space (a-kv )? "}" space
491491
space ::= " "?
492492
string ::= "\"" char* "\"" space
@@ -510,7 +510,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
510510
b-kv ::= "\"b\"" space ":" space string
511511
b-rest ::= ( "," space c-kv )?
512512
c-kv ::= "\"c\"" space ":" space string
513-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
513+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
514514
root ::= "{" space (a-kv a-rest | b-kv b-rest | c-kv )? "}" space
515515
space ::= " "?
516516
string ::= "\"" char* "\"" space
@@ -534,7 +534,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
534534
a-kv ::= "\"a\"" space ":" space string
535535
b-kv ::= "\"b\"" space ":" space string
536536
c-kv ::= "\"c\"" space ":" space string
537-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
537+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
538538
d-kv ::= "\"d\"" space ":" space string
539539
d-rest ::= ( "," space c-kv )?
540540
root ::= "{" space b-kv "," space a-kv ( "," space ( d-kv d-rest | c-kv ) )? "}" space
@@ -554,7 +554,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
554554
additional-kv ::= string ":" space additional-value
555555
additional-kvs ::= additional-kv ( "," space additional-kv )*
556556
additional-value ::= "[" space (number ("," space number)*)? "]" space
557-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
557+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
558558
decimal-part ::= [0-9]{1,16}
559559
integral-part ::= [0] | [1-9] [0-9]{0,15}
560560
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
@@ -574,7 +574,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
574574
R"""(
575575
array ::= "[" space ( value ("," space value)* )? "]" space
576576
boolean ::= ("true" | "false") space
577-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
577+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
578578
decimal-part ::= [0-9]{1,16}
579579
integral-part ::= [0] | [1-9] [0-9]{0,15}
580580
null ::= "null" space
@@ -596,7 +596,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
596596
R"""(
597597
array ::= "[" space ( value ("," space value)* )? "]" space
598598
boolean ::= ("true" | "false") space
599-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
599+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
600600
decimal-part ::= [0-9]{1,16}
601601
integral-part ::= [0] | [1-9] [0-9]{0,15}
602602
null ::= "null" space
@@ -637,7 +637,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
637637
a-kv ::= "\"a\"" space ":" space number
638638
additional-kv ::= string ":" space string
639639
additional-kvs ::= additional-kv ( "," space additional-kv )*
640-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
640+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
641641
decimal-part ::= [0-9]{1,16}
642642
integral-part ::= [0] | [1-9] [0-9]{0,15}
643643
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
@@ -662,7 +662,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
662662
a-rest ::= additional-kvs
663663
additional-kv ::= string ":" space number
664664
additional-kvs ::= additional-kv ( "," space additional-kv )*
665-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
665+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
666666
decimal-part ::= [0-9]{1,16}
667667
integral-part ::= [0] | [1-9] [0-9]{0,15}
668668
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
@@ -690,7 +690,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
690690
additional-kvs ::= additional-kv ( "," space additional-kv )*
691691
b-kv ::= "\"b\"" space ":" space number
692692
b-rest ::= additional-kvs
693-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
693+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
694694
decimal-part ::= [0-9]{1,16}
695695
integral-part ::= [0] | [1-9] [0-9]{0,15}
696696
number ::= ("-"? integral-part) ("." decimal-part)? ([eE] [-+]? integral-part)? space
@@ -721,7 +721,7 @@ static void test_all(const std::string & lang, std::function<void(const TestCase
721721
}
722722
})""",
723723
R"""(
724-
char ::= [^"\\] | "\\" (["\\/bfnrt] | "u" [0-9a-fA-F]{4})
724+
char ::= [^"\\\x7F\x00-\x1F] | [\\] (["\\bfnrt] | "u" [0-9a-fA-F]{4})
725725
foo ::= "{" space foo-a-kv "}" space
726726
foo-a-kv ::= "\"a\"" space ":" space string
727727
root ::= foo

0 commit comments

Comments
 (0)
Please sign in to comment.