I've thought about building this for a while, glad it's out there!<p>Not only does this guarantee your output is JSON, it lowers your generation cost and latency by filling in many of the repetitive schema tokens without passing them through the LLM.<p>For the very common case of "extracting multiple structured fields from a piece of unstructured text," I believe there's an even stronger optimization possible that would further decrease costs, latency and potentially even improve accuracy.<p>Assuming the fields you want to extract are independent (and they often are), you don't <i>need</i> to generate them all in one go autoregressively. Eg. instead of running the following pseudo-prompt:<p><pre><code> "Input: 'It's sunny and cold today'
Output schema: {"sunny": boolean, "temperature": string}"
</code></pre>
You could instead run the following two:<p><pre><code> "Input: 'It's sunny and cold today'
Output schema: {"sunny": boolean}"
"Input: 'It's sunny and cold today'
Output schema: {"temperature": string}"
</code></pre>
We don't do that today because when done naively it's very inefficient -- you'd be tokenizing, passing to the GPU, and computing the KV cache of the shared part of the prompt twice. But a library with the right abstraction could run the second two queries in a batch in parallel and reuse the same tokenization and KV cache for both of them. It would actually be <i>more</i> efficient than generating both fields in one go, since when you factor out the shared prefixes both the generated text and its context are shorter!<p>I mentioned above that this could also improve accuracy. Of course it doesn't do that by default (except that by excluding all the irrelevant fields it makes self-attention's job easier). But what it <i>does</i> do is give you an independent prompt for each field you're interested in. And so for particularly tricky fields you're trying to extract, you have the flexibility to eg. add several examples to make the generation N-shot.
Oh nice! I built a similar system a few weeks ago: <a href="https://github.com/newhouseb/clownfish">https://github.com/newhouseb/clownfish</a><p>I think the main differentiating factor here is that this is better if you have a simpler JSON schema without enums or oneOf constraints. If you do have these constraints, i.e. let's say you wanted an array of different types that represented a items on a menu { kind: pizza, toppings: [pepperoni] } or { kind: ice_cream, flavor: vanilla | strawberry } then you would need something more sophisticated like clownfish that can ask the LLM to pick specific properties (and an ability to do some backtracking so you can do proper beam search).<p>For completeness, another common approach can be found here: <a href="https://github.com/ShreyaR/guardrails">https://github.com/ShreyaR/guardrails</a> which essentially boils down to "provide the schema in the prompt and ask the LLM to correct things if it fails to get the schema right the first time."
> Bulletproof JSON generation: Jsonformer ensures that the generated JSON is always syntactically correct and conforms to the specified schema.<p>This is an important definition to take note of: "bulletproof" doesn't mean that you'll get good or correct data. It only means that it'll be valid JSON and in a particular schema that you specify (because the LLM isn't building the JSON in the first place, the library is).<p>It's an interesting idea. But it's not clear if they've validated the heuristics they use, to see how well it performs in terms of accuracy against, say, some kind of BeautifulSoup-like attempt to make sense of the JSON-ish that the LLM produces and correct that to be valid JSON, or any other approach to the problem.
Love to see further work on constrained decoding like this and other systems introduced in the comments!<p>See my work and the paper about it. I've got a lot of y'all beat on this (constrained decoding, not the templating and structuring) by about a year:<p><a href="https://github.com/hellisotherpeople/constrained-text-generation-studio">https://github.com/hellisotherpeople/constrained-text-genera...</a>
Seen a lot of things trying to do this by pressure testing the outputs, but all feel like anti-patterns. This is the first that seems like the "right" way to do it. Better to manage how the model is generating vs creating one more potentially faulty "glue" layer.
I found it rather strange that the new AndrewNG course about prompting, that features an OpenAI employee, says nothing about templated output.<p>To me this is a killer feature of GPT, being able to turn a document into a json or any other template.<p>The kind of prompt is just amazing for GPT (try it with a blog post, document or any other thing):
"Analyze this document and transform it into the following format:<p><title><p><summary (text conciseness: 5/10)><p><content bullet points (text conciseness 3/10)><p><content_item 1><p><content_item 2><p><content_item N>"<p>Also you can ask the same prompt in a json and GPT will gladly transform a PDF into a JSON.
I knew a similar one called GPTyped, just posted it on HN <a href="https://news.ycombinator.com/item?id=35793056#35793057" rel="nofollow">https://news.ycombinator.com/item?id=35793056#35793057</a>
How about going one step further and constrain transformer output with a context-free grammar? That way you can generate more conformant code such as Python or C.
Has anyone seen a tool like this that uses Node rather than Python? I have this exact problem in a GPT-based web application I am building and have had to resort to some “creative” solutions. At the very least I am glad to see people are tackling this problem.
Nice tool, will check it out. I had to go through a painstaking trial and error process to generate valid and deterministic JSON for my AI presentation tool called Slide Genie (<a href="https://slidegenie.vercel.app/" rel="nofollow">https://slidegenie.vercel.app/</a>). The hard part was making it work when temperature > 0.
Nice this codifies something similar I've been doing in my prompts! Will be using this instead.<p>What I currently have been doing:<p>The JSON template for your response is provided below. The parts to fill out are capitalized. Please do not modify the template.
Please fill in the template with one of the above options for your response.
<result>
{
"rating": "N. RATING",
"reason": "REASON"
}
</result>
I actually did this with an silly little app I made that generates fake social media profiles (<a href="https://lookface.app" rel="nofollow">https://lookface.app</a>). I gave it a prompt telling it what to generate and an example JSON. As long as you say it must be in JSON I haven't had any problems with it generating bad JSON.
Nice job - I've tried to massage the outputs to be structured and sometimes it works, but sometimes it fails badly. Having a more specific set of constraints around it will definitely make it more effective.
I wanted to see the opposite - parsing JSON and YAML generated from LLMs. It doesn't happen much with GPT-4 but lesser models might mess up the format and then you can't simply parse it.
Something like this should be integrated with library like <a href="https://fakerjs.dev/" rel="nofollow">https://fakerjs.dev/</a>
With LLM or in general AI based generation of the fake data it can be more diverse and generalized for lot's more applications and help developers
My bad if I am unaware of faker having AI based generation already, but afaik it does not have right now
I like the idea of getting ChatGPT to return something easily parse-able by a program. I've been using an XML derivative for that. <a href="https://github.com/ColinRyan/Chat-Markup-Language">https://github.com/ColinRyan/Chat-Markup-Language</a><p>Never thought to use json schema. I'll check this out!
I might be reading the code wrong but it looks like it crawls the schema making a generation per primitive type. While that’s a clever way to ensure valid JSON, I don’t know if I’d go as far as to describe it as efficient.<p>Saying that if the model is unable to generate JSON due to its training/fine tuning, this is indeed a clever solution!
<p><pre><code> Efficiency: By generating only the content tokens and filling in the fixed tokens, Jsonformer is more efficient than generating a full JSON string and parsing it.
</code></pre>
I was excited to try this in Replit... and realized it required pytorch. Ouch. Replit was not happy about that!
Is there a way to do something like this but with fine tuning? For example, I want to train a Lora to become a email spam classifier. I have the training data for the prompt as the email and the response as {Boolean:True/False}?
Its not very hard through prompting. You can just ask the LLM to generate on these parameters. I did this exact same thing and never wrote any code for it.
I hope that this is new to no-one generating JSON using LLM, because it felt like the first thing you'd do when I implemented that kind of stuff. That being said, it's nice to have that as a library ready-to-go.