科技回声

Hey HN, I built something to estimate things: <a href="https://guesstimate.ai" rel="nofollow">https://guesstimate.ai</a>Unlike ChatGPT, Guesstimate can do calculations. I asked both to calculate the time it'd take an object to fall from the ISS to sea level (about 400km, assume no air resistance). ChatGPT, using a lot of words, did everything correctly right until the last step [1]. Guesstimate took a similar approach, but it got the right answer: <a href="https://guesstimate.ai/e/est114xrhnj1f6h9cm44" rel="nofollow">https://guesstimate.ai/e/est114xrhnj1f6h9cm44</a>I think it's absurd that we're trying to teach LLMs basic arithmetic when there's a literal arithmetic processing unit right next to the GPU. You wouldn't ask a person to multiply two 5-digit numbers from just staring at the numbers; you'd give them a pen and paper. LLMs are great at chain of thought reasoning (CoT), and CPUs are great at memory and math, so why not take the best of both?That's how Guesstimate works. It generates CoT reasoning in Python, then parses the AST to build a computational graph. This way, you can play with the numbers right in the browser. For example, when estimating the cost of bread in 2050, if you don't like Guesstimate's assumed inflation rate, just change it. Kinda like a spreadsheet UI designed just for your question.Teaching LLMs to use tools isn't novel [2] [3], but it's a relatively recent idea. I built this in a weekend so the demo will probably get formulas/units wrong or totally break, but it generally seems to work. I'd like to get feedback on what works, what doesn't and how to make it better.[1] <a href="https://i.imgur.com/tDoeMqp.png" rel="nofollow">https://i.imgur.com/tDoeMqp.png</a>[2] <a href="https://arxiv.org/abs/2211.10435" rel="nofollow">https://arxiv.org/abs/2211.10435</a>[3] <a href="https://arxiv.org/abs/2303.09014" rel="nofollow">https://arxiv.org/abs/2303.09014</a>

Show HN: Guesstimate – Generate a spreadsheet-like interface for any question

暂无评论

Show HN: Guesstimate – Generate a spreadsheet-like interface for any question

暂无评论