TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Structured output from LLMs without reprompting

174 pointsby sandkoanalmost 2 years ago
Built a tool for transforming unstructured data into structured outputs using language models (with 100% adherence).<p>If you&#x27;re facing problems getting GPT to adhere to a schema (JSON, XML, etc.) or regex, need to bulk process some unstructured data, or generate synthetic data, check it out.<p>We run our own tuned model (you can self-host if you want), so, we&#x27;re able to have incredibly fine grained control over text generation.<p>Repository: <a href="https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;trex">https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;trex</a><p>Playground: <a href="https:&#x2F;&#x2F;automorphic.ai&#x2F;playground" rel="nofollow noreferrer">https:&#x2F;&#x2F;automorphic.ai&#x2F;playground</a>

12 comments

behnamohalmost 2 years ago
the more it goes, the more I realize that the true power of LLMs is not in unstructured text that they can generate, but in structured output. but there are two approaches to achieve this:<p>1. LMQL&#x2F;guidance&#x2F;JSONformer&#x2F;OP&#x27;s post<p>2. finetuning the model to understand function calls and their (potentially) JSON schemas.<p>there was a comment here about OpenAI&#x27;s approach (finetuning a model to understand function call) which raised a good point: since finetuning is often forgetful (previous knowledge learnt by the model gets forgotten a little bit), it&#x27;s not clear if OpenAI&#x27;s approach has made GPT-4 less capable than it was before. Not to mention that you&#x27;re still dealing with a statistical process (LLM), not a locked-in algorithm that generates the desired schema 100% the time.<p>Which brings me to the other approach: steering the LLM&#x27;s output __as it is generating tokens__, which is what LMQL does. This results in less token usage (you don&#x27;t send function schema as part of your prompt&#x2F;message to OpenAI) and 100% accuracy because token probabilities are modified (e.g., 0% chance of any character except &quot;:&quot; after a double quotation mark).
评论 #36753611 未加载
评论 #36754911 未加载
评论 #36754509 未加载
评论 #36754313 未加载
评论 #36755037 未加载
foundry27almost 2 years ago
I think the references to this being a “tool” that you can “self-host if you want” are a little disingenuous, especially after seeing that the linked GitHub project doesn’t mention the fact that it’s just a thin wrapper client making requests to a remote server until you’re halfway through the README. The product might be great, but introducing it to the community in this way doesn’t foster much trust in your company from the perspective of a potential customer.<p>The only reference I can find to this being a self-hosted model is a blurb in the GitHub README saying “If you&#x27;d like to self-host this in your own cloud, email us”. Sure, I can email my OpenAI&#x2F;Microsoft rep and self-host GPT-4 in my own cloud for enough money too, but that doesn’t change the fact that the primary business model is SaaS. Just be up-front about this fact in community posts, rather than obfuscating it. Your website does a great job with that.
评论 #36763376 未加载
jensneusealmost 2 years ago
We&#x27;re using a similar approach with OpenAI. The user can define a schema using zod and call a prompt. We&#x27;re then using OpenAI functions behind the scenes to parse the answer into the shape the user wants. Add JSON schema validation on top and we can be sure that the response conforms to our Schema. Some more details and examples can be found in this blog post: <a href="https:&#x2F;&#x2F;wundergraph.com&#x2F;blog&#x2F;beyond_functions_seamlessly_build_ai_enhanced_apis_with_openai" rel="nofollow noreferrer">https:&#x2F;&#x2F;wundergraph.com&#x2F;blog&#x2F;beyond_functions_seamlessly_bui...</a>
sunshadowalmost 2 years ago
What are the benefits over <a href="https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance&#x2F;</a> ?
评论 #36752420 未加载
评论 #36753203 未加载
roseway4almost 2 years ago
Looking at the playground, it appears the few shot examples in the prompt and CFG are duplicative. What is the relationship between the two?<p>When you say in another comment that using OpenAI functions to output JSON is a waste of tokens, how are you generating the JSON output? And why do your prompts then include few shot examples of JSON objects?
评论 #36753254 未加载
Janymosalmost 2 years ago
What&#x27;s the difference between self hosting this and manually running <a href="https:&#x2F;&#x2F;github.com&#x2F;r2d4&#x2F;parserllm">https:&#x2F;&#x2F;github.com&#x2F;r2d4&#x2F;parserllm</a> ?
minzialmost 2 years ago
Can someone explain what they use structured output for? I’m just curious what kinds of use cases people have found for it.
评论 #36758076 未加载
easygenesalmost 2 years ago
You mention self hosting the model. Do you have the model weights up on HuggingFace?
评论 #36753634 未加载
darkteflonalmost 2 years ago
Could you contextualise this against OpenAI’s native functions?
评论 #36752991 未加载
marbanalmost 2 years ago
Can easily be done with OAI&#x27;s new function calling.
评论 #36763259 未加载
dkjaudyeqooealmost 2 years ago
Don&#x27;t forget to put a license on your repository.
评论 #36754259 未加载
beefnugsalmost 2 years ago
oh its 100% predictable alright, predictable to be garbage : the default example chooses height as the wrong &quot;number&quot; whatever that might assume to be, then if you try to change it to define height as perhaps &quot;height in total inches&quot; it still gets it wrong
评论 #36752333 未加载