Hey HN, I've been using GPT a lot lately in some side projects around data generation and benchmarking. During the course of prompt tuning I ended up with a pretty complicated request: the value that I was looking for, an explanation, a criticism, etc. JSON was the most natural output format for this but results would often be broken, have wrong types, or contain missing fields.<p>There's been some positive movement in this space, like with jsonformer (<a href="https://github.com/1rgs/jsonformer">https://github.com/1rgs/jsonformer</a>) the other day. But nothing that was plug and play with GPT.<p>This library consolidates the separate logic that I built across 5 different projects. It lets you prompt the model for how it should return fields, inject variable prompts, handle common formatting errors, then cast to pydantic when you're done for typehinting and validation in your IDE. If you're able to play around with it, let me know what you think.
I like the idea, but I think a library that focuses on producing requests and parsing responses according to schema is better. Sending requests to the server is orthogonal to the purpose.<p>What we've found useful in practice in dealing with similar problems:<p>- Use json5 instead of json when parsing. It allows trailing commas.<p>- Don't let it respond in true/false. Instead, ask it for a short sentence explaining whether it is true or false. Afterwards, use a small embedding model such as sbert to extract true/false from the sentence. We've found that GPT is able to reason better in this case, and it is much more robust.<p>- For numerical scores, do a similar thing by asking GPT for a description, then with the small embedding model write a few examples matching your score scale, and for each response use the score of the best matched example. If you let GPT give you scores directly without explanation, 20% of the time it will give you nonsense.
Here's 40 lines of python code that I've found to be unreasonably effective at accomplishing something similar:<p><a href="https://github.com/jiggy-ai/pydantic-chatcompletion/blob/master/pydantic_chatcompletion/__init__.py">https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...</a>
I've been interfacing with GPT programmatically for a little while now, leveraging it's "soft and fuzzy" interface to produce hard / machine-readable results. JSON was the format that felt best-suited for the job.<p>I see a ton of code in this project, and I don't know what most of it does. As far as GPT troubles with JSON, I'll add a couple: sometimes it likes to throw comments in there as if it was JS. And sometimes it'll triple-quote the JSON string as if it was Python.<p>My approach to solve these problems was via prompt engineering - using the system message part of the API call. Asking it to "return valid json, do not wrap it in text, do not preface it with text, do not include follow-up explanations, make sure it's valid json, do not include comments" - seems to work 99% of the time. For the remainder, a try-and-catch block with some fallback code that "extracts" json (via dumb REs) from whatever text was returned. Hasn't failed yet.<p>It's fascinating to watch the new paradigm arrive, and people using old habits to deal with it. This entire project is kind of pointless, you can just ask GPT to return the right kind of thing.
I'm having success with simple YAML schema. One thing that's very helpful for the prompt is to include "description" and "example":<p><pre><code> -
column name: salary_max
format: number
example: 150,000
description: Salary Maximum
-
column name: keywords
format: string
example: engineer, python, docker, remote
description: Relevant Keywords (Comma separated keywords used for filtering and matching jobs to candidates)</code></pre>
I built zod-chatgpt <a href="https://github.com/olup/zod-chatgpt">https://github.com/olup/zod-chatgpt</a> a couple of weeks ago to serve same purpose in typescript
I asked ChatGPT (GPT-4) about the idea of integrating GPT-JSON with CUE language (<a href="https://cuelang.org/" rel="nofollow">https://cuelang.org/</a>) for potential improvements and the answers are quite impressive:<p>"Yes, integrating the CUE language with GPT-JSON can potentially lead to improvements in its capability. CUE, short for Configuration, Unification, and Environment, is an open-source data constraint language designed to simplify the definition, generation, and validation of data in structured formats, like JSON, YAML, and others.<p>Integrating CUE with GPT-JSON can bring several benefits:<p>1) Validation: By using CUE, you can define schemas for the input and output JSON, which helps in validating the data structure and ensuring it adheres to the expected format. This can help reduce errors and improve the quality of the interaction with the GPT model.<p>2) Simplification: CUE can help simplify the process of defining and generating JSON inputs and outputs for GPT-JSON. By providing a more concise and expressive syntax, CUE can make it easier to craft structured inputs and outputs, which can be especially helpful when working with complex data structures.<p>3) Interoperability: Integrating CUE can improve the interoperability between GPT-JSON and other systems or applications. With CUE's ability to work with multiple data formats (such as JSON, YAML, or HCL), it becomes easier to integrate GPT-JSON with various tools and platforms.<p>To integrate CUE with GPT-JSON, you would need to follow these general steps:<p>1) Define CUE schemas for the input and output JSON structures used by GPT-JSON.<p>2) Use the CUE language to validate and generate input JSON for GPT-JSON.<p>3) Process the output JSON generated by GPT-JSON using CUE to ensure it adheres to the defined schema and transform it as needed.<p>Integrating CUE with GPT-JSON can improve the overall robustness and ease of use of the library, making it more accessible and useful for a wider range of applications."
Hey, this is really neat! I've taken a very similar approach in TypeScript. +1 to the sibling comment that recommended parsing with json5 (but don't tell the AI you're doing that, it's a waste of context space and it might get more confused anyway).<p>I've had luck doing chain-of-thought prompting in the JSON payload as you've described, too. Cheers, really validating to see someone taking a similar approach.
Have you considered Guardrails, <a href="https://shreyar.github.io/guardrails/" rel="nofollow">https://shreyar.github.io/guardrails/</a>, it’s like Pydantic but for LLMs?
In the last 24 hours I've seen a bunch of projects doing LLM -> JSON. I think we want to be focusing on markdown instead. An intuition I have developed is that an ideal prompt has a very clear narrative structure and very tight "semantic locality" (the instruction is at the end, the most salient data is close to the instruction, etc).<p>JSON is admittedly way easier to work with up front, but markdown seems to be a more scaleable choice.<p>Of course, this is all very much an opinion and highly anecdotal at the moment.
Since others are sharing their prompt-only solutions to get JSON, I'll share what I've been using. Has been working reliably:<p>"Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation.<p>{<p><pre><code> "author": "string describing the author full name",
"year": "number describing the year the book was written",
"isFiction": "boolean describing if the book is a work of fiction"
...
</code></pre>
}
"
I Found that LLMs are pretty good with TOML. The multiline strings are also a real bonus. One thing I thought that was interesting is that sometimes the LLM will mistake the triple quoutes for the backticks so it will output something like this<p><pre><code> ```
[TOML]
key="""
value
"""
"""</code></pre>
I happen to have a very similar idea recently and created this GPT-logic package for node. It basically transforms GPT generated results into JS data types. Check it out if you are interested. <a href="https://github.com/wayneshn/gpt-logic">https://github.com/wayneshn/gpt-logic</a>
Nice project! I took some inspiration from this as well as <a href="https://github.com/jiggy-ai/pydantic-chatcompletion/blob/master/pydantic_chatcompletion/__init__.py">https://github.com/jiggy-ai/pydantic-chatcompletion/blob/mas...</a> to create the following:<p><a href="https://github.com/knowsuchagency/struct-gpt">https://github.com/knowsuchagency/struct-gpt</a><p>I tried to make the API as intuitive as possible and added the ability to provide examples to improve the reliability and quality of the LLM's output.
Langchain also has a built in Pydantic output parser <a href="https://python.langchain.com/en/latest/modules/prompts/output_parsers/examples/pydantic.html" rel="nofollow">https://python.langchain.com/en/latest/modules/prompts/outpu...</a><p>However I've found it to be generally unreliable and adds a lot of text to each call.<p>I suspect this can be improved by:
- Only writing the parts of the spec which are needed, rather than the full JSON schema spec
- Including relevant examples rather than arbitrary JSON schema examples
I built a toy[0] for Typescript that works similarly to this.<p>It takes the expected return type of a function, translates it into a JSON Schema, queries OpenAI, validates the response, and then magically returns the response as typed output.<p>[0]<a href="https://github.com/jumploops/magic">https://github.com/jumploops/magic</a>
You can use simpler types. Instead of :<p><pre><code> messages=[
GPTMessage(
role=GPTMessageRole.SYSTEM,
content=SYSTEM_PROMPT,
),
GPTMessage(
role=GPTMessageRole.USER,
content="Text: I love this product. It's the best thing ever!",
)
]
</code></pre>
Try:<p><pre><code> messages=(
("system", SYSTEM_PROMPT),
("user", "Text: I love this product. It's the best thing ever!")
)
</code></pre>
Or:<p><pre><code> messages=(
SystemMsg(SYSTEM_PROMPT),
UserMsg("Text: I love this product. It's the best thing ever!")
)
</code></pre>
This is still Python, not Java.