TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Verify LLM Generated Code with a Spreadsheet

83 点作者 narush将近 2 年前
Hey HN! Been a minute. We launched Mito here last year (<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32723766" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=32723766</a>).<p>Mito is a spreadsheet that generates Python code as you edit it. We&#x27;ve spent the past three years trying to lower the startup cost to use Python for data work. In doing so, we’ve been thrust into the middle of many Python transition processes at larger enterprises, and we’ve seen up-close how non-technical folks interact with generated code.<p>The Mito AI chatbot lives inside of the Mito spreadsheet (<a href="https:&#x2F;&#x2F;www.trymito.io&#x2F;">https:&#x2F;&#x2F;www.trymito.io&#x2F;</a>&gt;. The obvious benefit of this is that you can use the chatbot to transform your data and write a repeatable Pythons script. The less obvious (but equally important) benefit is that by connecting a spreadsheet and chatbot, Mito helps you understand the impact of your edits and verify LLM generated code. Every time you use the chatbot, Mito highlights the changed data in the spreadsheet. You can see a quick demo here (<a href="https:&#x2F;&#x2F;www.tella.tv&#x2F;video&#x2F;clibtwssv00000fl65oky13nu&#x2F;view">https:&#x2F;&#x2F;www.tella.tv&#x2F;video&#x2F;clibtwssv00000fl65oky13nu&#x2F;view</a>).<p>Three main insights shaped our approach to LLM code generation:<p># Consumers of generated code don&#x27;t know enough Python to verify and correct the code<p>Mito users span the range of Python experience. For new programmers, generating code using LLMs is an easy step one. Ensuring the generated code is correct is the forgotten step two.<p>In practice, LLMs often generate incorrect code, or code with unexpected side effects. A user will prompt an LLM to calculate a total_revenue column from price and quantity columns. The LLM correctly calculates total_revenue = price * quantity but then mistakenly deletes price and quantity.<p>New programmers find it almost impossible to verify generated code by reading it alone. They need tooling designed for their skillsets.<p># Not everyone knows how to use a chat interface for transformations<p>We were surprised to learn that many Mito users a) had no experience with ChatGPT, and b) didn’t understand the chat interface at all! Mito AI presents users a few example prompts and an input field. A surprising number of users thought the example prompts were all they could use Mito AI for.<p>AI chatbots are new. Us builders might be using them for natural language interactions, but users are still learning how to use them in new contexts. This stands in stark contrast to spreadsheets, where pretty much ever business user has experience. Shout out 40 years of Excel dominance!<p># The more context a prompt has about the user’s data + edits, the better the LLM results<p>For the LLM to generate code that can execute correctly, the prompt should include the names of the dataframes, the column headers, (some) dataframe values, and a few previous edits as examples. Duh.<p>But there’s no reason users should be responsible for writing this prompt. No one loves writing long chats, and in practice Mito AI users expect to be able to write ~12 words. Spreadsheets are well-suited to building the rest of the prompt for you - they have all of your data context, and know your recent edits.<p>With these three insights, it became very clear to us what role a spreadsheet could play in LLM based code-gen: a spreadsheet is the prompt builder, and a spreadsheet is the code verifier.<p>Mito AI builds an effective prompt by supplementing your input with the context of your data and recent edits.<p>Mito AI then helps you to verify the LLM generated code by highlighting the added, modified, and removed data within the chat interface - and within the spreadsheet. This way, you can ensure your LLM generated code is correct.<p>Give it a spin. Let us know what you think of the recon and how we can make it more helpful!<p>Also, if you like what we’re doing, we’re hiring – come help us build! (<a href="https:&#x2F;&#x2F;www.ycombinator.com&#x2F;companies&#x2F;mito&#x2F;jobs" rel="nofollow">https:&#x2F;&#x2F;www.ycombinator.com&#x2F;companies&#x2F;mito&#x2F;jobs</a>)

3 条评论

pgbovine将近 2 年前
Cool work! You and your team may be interested in these two recent CHI papers from Microsoft Research, both on very relevant topics to what you&#x27;ve been doing:<p>1) “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2304.06597" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2304.06597</a>) -- they try to tackle a similar problem as what you described above<p>2) On the Design of AI-powered Code Assistants for Notebooks (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2301.11178" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2301.11178</a>) - uses Mito as part of their case study
评论 #36164936 未加载
villgax将近 2 年前
Should be the other way around, LLM should check against language spec to see compliance
评论 #36164750 未加载
评论 #36172174 未加载
评论 #36166692 未加载
评论 #36164636 未加载
aarondia将近 2 年前
Hey, I&#x27;m Aaron, co-founder of Mito. Funnily enough, doing &quot;diff detection&quot; in spreadsheets is like the first thing we made when building Mito. We built Git for Excel to enable better collaboration around Excel models -- turns out Excel power users would rather play in single player mode. So it&#x27;s funny to be exploring spreadsheet difference detection again a few years later. This time, thinking about it purely in single player mode to understand the impact of LLM generated code on your data.
评论 #36162294 未加载