TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Structured output from LLMs without reprompting

174 点作者 sandkoan将近 2 年前
Built a tool for transforming unstructured data into structured outputs using language models (with 100% adherence).<p>If you&#x27;re facing problems getting GPT to adhere to a schema (JSON, XML, etc.) or regex, need to bulk process some unstructured data, or generate synthetic data, check it out.<p>We run our own tuned model (you can self-host if you want), so, we&#x27;re able to have incredibly fine grained control over text generation.<p>Repository: <a href="https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;trex">https:&#x2F;&#x2F;github.com&#x2F;automorphic-ai&#x2F;trex</a><p>Playground: <a href="https:&#x2F;&#x2F;automorphic.ai&#x2F;playground" rel="nofollow noreferrer">https:&#x2F;&#x2F;automorphic.ai&#x2F;playground</a>

12 条评论

behnamoh将近 2 年前
the more it goes, the more I realize that the true power of LLMs is not in unstructured text that they can generate, but in structured output. but there are two approaches to achieve this:<p>1. LMQL&#x2F;guidance&#x2F;JSONformer&#x2F;OP&#x27;s post<p>2. finetuning the model to understand function calls and their (potentially) JSON schemas.<p>there was a comment here about OpenAI&#x27;s approach (finetuning a model to understand function call) which raised a good point: since finetuning is often forgetful (previous knowledge learnt by the model gets forgotten a little bit), it&#x27;s not clear if OpenAI&#x27;s approach has made GPT-4 less capable than it was before. Not to mention that you&#x27;re still dealing with a statistical process (LLM), not a locked-in algorithm that generates the desired schema 100% the time.<p>Which brings me to the other approach: steering the LLM&#x27;s output __as it is generating tokens__, which is what LMQL does. This results in less token usage (you don&#x27;t send function schema as part of your prompt&#x2F;message to OpenAI) and 100% accuracy because token probabilities are modified (e.g., 0% chance of any character except &quot;:&quot; after a double quotation mark).
评论 #36753611 未加载
评论 #36754911 未加载
评论 #36754509 未加载
评论 #36754313 未加载
评论 #36755037 未加载
foundry27将近 2 年前
I think the references to this being a “tool” that you can “self-host if you want” are a little disingenuous, especially after seeing that the linked GitHub project doesn’t mention the fact that it’s just a thin wrapper client making requests to a remote server until you’re halfway through the README. The product might be great, but introducing it to the community in this way doesn’t foster much trust in your company from the perspective of a potential customer.<p>The only reference I can find to this being a self-hosted model is a blurb in the GitHub README saying “If you&#x27;d like to self-host this in your own cloud, email us”. Sure, I can email my OpenAI&#x2F;Microsoft rep and self-host GPT-4 in my own cloud for enough money too, but that doesn’t change the fact that the primary business model is SaaS. Just be up-front about this fact in community posts, rather than obfuscating it. Your website does a great job with that.
评论 #36763376 未加载
jensneuse将近 2 年前
We&#x27;re using a similar approach with OpenAI. The user can define a schema using zod and call a prompt. We&#x27;re then using OpenAI functions behind the scenes to parse the answer into the shape the user wants. Add JSON schema validation on top and we can be sure that the response conforms to our Schema. Some more details and examples can be found in this blog post: <a href="https:&#x2F;&#x2F;wundergraph.com&#x2F;blog&#x2F;beyond_functions_seamlessly_build_ai_enhanced_apis_with_openai" rel="nofollow noreferrer">https:&#x2F;&#x2F;wundergraph.com&#x2F;blog&#x2F;beyond_functions_seamlessly_bui...</a>
sunshadow将近 2 年前
What are the benefits over <a href="https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance&#x2F;">https:&#x2F;&#x2F;github.com&#x2F;microsoft&#x2F;guidance&#x2F;</a> ?
评论 #36752420 未加载
评论 #36753203 未加载
roseway4将近 2 年前
Looking at the playground, it appears the few shot examples in the prompt and CFG are duplicative. What is the relationship between the two?<p>When you say in another comment that using OpenAI functions to output JSON is a waste of tokens, how are you generating the JSON output? And why do your prompts then include few shot examples of JSON objects?
评论 #36753254 未加载
Janymos将近 2 年前
What&#x27;s the difference between self hosting this and manually running <a href="https:&#x2F;&#x2F;github.com&#x2F;r2d4&#x2F;parserllm">https:&#x2F;&#x2F;github.com&#x2F;r2d4&#x2F;parserllm</a> ?
minzi将近 2 年前
Can someone explain what they use structured output for? I’m just curious what kinds of use cases people have found for it.
评论 #36758076 未加载
easygenes将近 2 年前
You mention self hosting the model. Do you have the model weights up on HuggingFace?
评论 #36753634 未加载
darkteflon将近 2 年前
Could you contextualise this against OpenAI’s native functions?
评论 #36752991 未加载
marban将近 2 年前
Can easily be done with OAI&#x27;s new function calling.
评论 #36763259 未加载
dkjaudyeqooe将近 2 年前
Don&#x27;t forget to put a license on your repository.
评论 #36754259 未加载
beefnugs将近 2 年前
oh its 100% predictable alright, predictable to be garbage : the default example chooses height as the wrong &quot;number&quot; whatever that might assume to be, then if you try to change it to define height as perhaps &quot;height in total inches&quot; it still gets it wrong
评论 #36752333 未加载