科技回声

10 条评论

devbent9 个月前

One problem that I run into with LLM code generation on large projects is that at some point the LLM runs into a problem it just cannot fix no matter how it is prompted. This manifest in a number of ways. Sometimes it is by bouncing back and forth between two invalid solutions while other times it is bouncing back and forth fixing one issue and while breaking something else in another part of the code.Another issue with complex projects is that llms will not tell you what you don't know. They will happily go about designing crappy code if you ask them for a crappy solution and they don't have the ability to recommend a better path forward unless explicitly prompted.That said, I had Claude generate most of a tile-based 2D pixel art rendering engine[1] for me, but again, once things got complicated I had to go and start hand fixing the code because Claude was no longer able to make improvements.I've seen these failure modes across multiple problem domains, from CSS (alternating between two broken css styles, neither came close to fixing the issue) to backend, to rendering code (trying to get character sprites correctly on the tiles)[1] <a href="https://www.generativestorytelling.ai/town/index.html" rel="nofollow">https://www.generativestorytelling.ai/town/index.html</a> notice the tons of rendering artifacts. I've realized I'm going to need to rewrite a lot of how rendering happens to resolve them. Claude wrote 80% of the original code but by the time I'm done fixing everything maybe only 30% or so of Claude's code will remain.

评论 #41337130 未加载

评论 #41336359 未加载

评论 #41335779 未加载

评论 #41337305 未加载

hebejebelus9 个月前

Reviving my long-dead account to say that I built a perfectly functional small site to help schedule my dungeons and dragons group within about 5 minutes, on my phone, from my bed. If this isn't the future I don't want to go there. Fantastic work.

评论 #41334349 未加载

评论 #41333877 未加载

anonzzzies9 个月前

Whenever there is a 'big new AI model' launch, I try to build one of my side projects fully with the new AI. So I do not touch anything myself, I only talk english to it. I do read the generated code and instructions so I can correct them in English; no code at all. It worked twice; with chatgpt4 and the sonnet launch. All the others did not manage without significant code or ops help.It is a very annoying experience even if you know what you are doing; it is still much faster than what you would get done writing code but it is very frustrating getting the last 10% right; you spend a day on 80%, two days on 10 and a week on the last 10%. If I just jump in and fix the code myself, it is about 1 day for the same project, which is still amazing (and not imaginable before).People complaining that it sucks and it cannot figure things out often are right, however, it is a lot better than what we had before, which was doing all this by hand (causing many people to procrastinate over even starting a side project while have 1000s in mind every day).These types of services are important and I like this val.town idea. Well done and keep going.

评论 #41337001 未加载

osigurdson9 个月前

LLMs are massively useful, just not in the way that people think they should work. No, you are not the manager with AI doing the work. Instead, it is more like someone to bounce ideas off of, a teacher (who is often wrong) and a reviewer. The human, ironically, is the specialist that can get details right, not AI (at least not for now).

deckiedan9 个月前

I just played with townie AI for an hour or so... Very cool! Very fun.There's still some glitches, occasionally the entire app code would get replaced by the function the LLM was trying to update. I could fix it by telling it that's what had happened, and it would then fill everything in again... Waiting for the entire app to be rewritten each time was a bit annoying.It got the initial concepts of the app very very quickly running, but then struggled with some CSS stuff, saying it would try a different approach, or apologising for missing things repeatedly...and eventually it told me it would try more radical approaches and wrote online styles... I wonder if the single file approach has limitations in that respect.Very interesting, very fun to play with.I'm kind of concerned for security things with LLM written apps - you can ask it to do things and it says yes, without really thinking if it's a good idea or not.But cool!And anything which helps with the internet to be full of small independent quirky creative ideas, the better.

评论 #41337527 未加载

wonger_9 个月前

The author's bit about "IterateGPT" reminds me of this "AI in a loop" post from last year: <a href="https://til.simonwillison.net/llms/python-react-pattern" rel="nofollow">https://til.simonwillison.net/llms/python-react-pattern</a><pre><code> prompt = """ You run in a loop of Thought, Action, PAUSE, Observation. At the end of the loop you output an Answer Use Thought to describe your thoughts about the question you have been asked. Use Action to run one of the actions available to you - then return PAUSE. Observation will be the result of running those actions. ...""" </code></pre> Seems like a really powerful technique to have LLMs act on their own feedback.

评论 #41338619 未加载

janpaul1239 个月前

Post author here! Happy to answer any questions.

thelastparadise9 个月前

Couldn't this essentially be used as a training data generator?E.g. have humans + LLMs generate a bunch of prompts that goes into this system, and it spits out a bunch of fully-fledged applications, which can be used to train an even bigger model.

syspec9 个月前

Does fullstack here mean using javascript on the backend? Is it able to generate code for other backend languages?

评论 #41333010 未加载

01HNNWZ0MV43FF9 个月前

And they can't be self-hosted?

评论 #41334598 未加载

10 条评论

devbent9 个月前

评论 #41337130 未加载

评论 #41336359 未加载

评论 #41335779 未加载

评论 #41337305 未加载

hebejebelus9 个月前

评论 #41334349 未加载

评论 #41333877 未加载

anonzzzies9 个月前

评论 #41337001 未加载

osigurdson9 个月前

deckiedan9 个月前

评论 #41337527 未加载

wonger_9 个月前

评论 #41338619 未加载

janpaul1239 个月前

Post author here! Happy to answer any questions.

thelastparadise9 个月前

syspec9 个月前

Does fullstack here mean using javascript on the backend? Is it able to generate code for other backend languages?

评论 #41333010 未加载

01HNNWZ0MV43FF9 个月前

And they can't be self-hosted?

评论 #41334598 未加载

How we built Townie – an app that generates fullstack apps

10 条评论

How we built Townie – an app that generates fullstack apps

10 条评论