Skip to content

Commit 1ef7fab

Browse files
committed
main: add --json-schema / -j
1 parent ab9a324 commit 1ef7fab

File tree

4 files changed

+20
-3
lines changed

4 files changed

+20
-3
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,7 @@ clean:
733733
# Helper function that replaces .c, .cpp, and .cu file endings with .o:
734734
GET_OBJ_FILE = $(patsubst %.c,%.o,$(patsubst %.cpp,%.o,$(patsubst %.cu,%.o,$(1))))
735735

736-
main: examples/main/main.cpp ggml.o llama.o $(COMMON_DEPS) console.o grammar-parser.o $(OBJS)
736+
main: examples/main/main.cpp ggml.o llama.o $(COMMON_DEPS) console.o grammar-parser.o json-schema-to-grammar.o $(OBJS)
737737
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
738738
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)
739739
@echo

common/common.cpp

+15
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
#include "common.h"
2+
#include "json.hpp"
3+
#include "json-schema-to-grammar.h"
24
#include "llama.h"
35

46
#include <algorithm>
@@ -68,6 +70,8 @@
6870
#define LLAMA_CURL_MAX_HEADER_LENGTH 256
6971
#endif // LLAMA_USE_CURL
7072

73+
using json = nlohmann::ordered_json;
74+
7175
int32_t get_num_physical_cores() {
7276
#ifdef __linux__
7377
// enumerate the set of thread siblings, num entries is num cores
@@ -1148,6 +1152,14 @@ bool gpt_params_find_arg(int argc, char ** argv, const std::string & arg, gpt_pa
11481152
);
11491153
return true;
11501154
}
1155+
if (arg == "-j" || arg == "--json-schema") {
1156+
if (++i >= argc) {
1157+
invalid_param = true;
1158+
return true;
1159+
}
1160+
sparams.grammar = json_schema_to_grammar(json::parse(argv[i]));
1161+
return true;
1162+
}
11511163
if (arg == "--override-kv") {
11521164
if (++i >= argc) {
11531165
invalid_param = true;
@@ -1353,6 +1365,9 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
13531365
printf(" or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'\n");
13541366
printf(" --grammar GRAMMAR BNF-like grammar to constrain generations (see samples in grammars/ dir)\n");
13551367
printf(" --grammar-file FNAME file to read grammar from\n");
1368+
printf(" -j SCHEMA, --json-schema SCHEMA\n");
1369+
printf(" JSON schema to constrain generations (https://json-schema.org/), e.g. `{}` for any JSON object.\n");
1370+
printf(" For schemas w/ external $refs, use --grammar + example/json_schema_to_grammar.py instead\n");
13561371
printf(" --cfg-negative-prompt PROMPT\n");
13571372
printf(" negative prompt to use for guidance. (default: empty)\n");
13581373
printf(" --cfg-negative-prompt-file FNAME\n");

examples/main/CMakeLists.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
set(TARGET main)
22
add_executable(${TARGET} main.cpp)
33
install(TARGETS ${TARGET} RUNTIME)
4-
target_link_libraries(${TARGET} PRIVATE common llama ${CMAKE_THREAD_LIBS_INIT})
4+
target_link_libraries(${TARGET} PRIVATE common llama json-schema-to-grammar ${CMAKE_THREAD_LIBS_INIT})
55
target_compile_features(${TARGET} PRIVATE cxx_std_11)

examples/main/README.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -304,10 +304,12 @@ These options help improve the performance and memory usage of the LLaMA models.
304304

305305
- `--prompt-cache FNAME`: Specify a file to cache the model state after the initial prompt. This can significantly speed up the startup time when you're using longer prompts. The file is created during the first run and is reused and updated in subsequent runs. **Note**: Restoring a cached prompt does not imply restoring the exact state of the session at the point it was saved. So even when specifying a specific seed, you are not guaranteed to get the same sequence of tokens as the original generation.
306306

307-
### Grammars
307+
### Grammars & JSON schemas
308308

309309
- `--grammar GRAMMAR`, `--grammar-file FILE`: Specify a grammar (defined inline or in a file) to constrain model output to a specific format. For example, you could force the model to output JSON or to speak only in emojis. See the [GBNF guide](../../grammars/README.md) for details on the syntax.
310310

311+
- `--json-schema SCHEMA`: Specify a [JSON schema](https://json-schema.org/) to constrain model output to (e.g. `{}` for any JSON object, or `{"items": {"type": "string", "minLength": 10, "maxLength": 100}, "minItems": 10}` for a JSON array of strings with size constraints). If a schema uses external `$ref`s, you should use `--grammar "$( python examples/json_schema_to_grammar.py myschema.json )"` instead.
312+
311313
### Quantization
312314

313315
For information about 4-bit quantization, which can significantly improve performance and reduce memory usage, please refer to llama.cpp's primary [README](../../README.md#prepare-and-quantize).

0 commit comments

Comments
 (0)