My LLM codegen workflow

522 点作者 lolptdr3 个月前

39 条评论

briga3 个月前

Absolutely LLMs are great for greenfield projects. They can get you to a prototype for a new idea faster than any tool yet invented. Where they start to break down, I find, is when you ask it to make changes/refactors to existing code and mature projects. They usually lack context, so they doesn't hesitate to introduce lots of extra complexity, add frameworks you don't need, and in general just make the situation worse. Or if they get you to some solution it will have taken so long that you might as well have just done the heavy lifting yourself. LLMs are still no substitute for actually understanding your code.

评论 #43097401 未加载

评论 #43099269 未加载

评论 #43100348 未加载

评论 #43102004 未加载

评论 #43118708 未加载

评论 #43097975 未加载

jrexilius3 个月前

The first part of this, where you told it to ask YOU questions, rather than laboriously building prompts and context yourself was the magic ticket for me. And I doubt I would have stumbled on that sorta inverse logic on my own. Really great write up!

评论 #43095350 未加载

评论 #43096970 未加载

评论 #43096603 未加载

评论 #43096749 未加载

评论 #43096297 未加载

bcoates3 个月前

That lonely/downtime section at the end is a giant red flag for me.It looks like the sort of nonproductive yak-shaving you do when you're stuck or avoiding an unpleasant task--coasting, fooling around incrementally with your LLM because your project's fucked and you psychologically need some sense of progress.The opposite of this is burnout--one of the things they don't tell you about successful projects with good tools is they induce much more burnout than doomed projects. There's a sort of Amdahl's Law in effect, where all the tooling just gives you more time to focus on the actual fundamentals of the product/project/problem you’re trying to address, which is stressful and mentally taxing even when it works.Fucking around with LLM coding tools, otoh, is very fun, and like constantly clean-rebuilding your whole (doomed) project, gives you both some downtime and a sense of forward momentum--look how much the computer is chugging!The reality testing to see if the tool is really helping is to sit down with a concrete goal and a (near) hard deadline. Every time I've tried to use an LLM under these conditions it just fails catastrophically--I don't just get stuck, I realize how basically every implicit decision embedded in the LLM output has an unacceptably high likelihood of being wrong, and I have an amount of debug cycles ahead of me exceeding the time to throw it all away and do it without the LLM by, like, an order of magnitude.I'm not an LLM-coding hater and I've been doing AI stuff that's worked for decades, but current offerings I've tried aren't even close to productive compared to searching for code that already exists on the web.

评论 #43097384 未加载

评论 #43099292 未加载

评论 #43112167 未加载

评论 #43099840 未加载

评论 #43100192 未加载

rotcev3 个月前

This is the first article I’ve come across that truly utilizes LLMs in a workflow the right way. I appreciate the time and effort the author put into breaking this down.I believe most people who struggle to be productive with language models simply haven’t put in the necessary practice to communicate effectively with AI. The issue isn’t with the intelligence of the models—it’s that humans are still learning how to use this tool properly. It’s clear that the author has spent time mastering the art of communicating with LLMs. Many of the conclusions in this post feel obvious once you’ve developed an understanding of how these models "think" and how to work within their constraints.I’m a huge fan of the workflow described here, and I’ll definitely be looking into AIder and repomix. I’ve had a lot of success using a similar approach with Cursor in Composer Agent mode, where Claude-3.5-sonnet acts as my "code implementer." I strategize with larger reasoning models (like o1-pro, o3-mini-high, etc.) and delegate execution to Claude, which excels at making inline code edits. While it’s not perfect, the time savings far outweigh the effort required to review an "AI Pull Request."Maximizing efficiency in this kind of workflow requires a few key things:- High typing speed – Minimizing time spent writing prompts means maximizing time generating useful code.- A strong intuition for "what’s right" vs. "what’s wrong" – This will probably become less relevant as models improve, but for now, good judgment is crucial.- Familiarity with each model’s strengths and weaknesses – This only comes with hands-on experience.Right now, LLMs don’t work flawlessly out of the box for everyone, and I think that’s where a lot of the complaints come from—the "AI haterade" crowd expects perfection without adaptation.For what it’s worth, I’ve built large-scale production applications using these techniques while writing minimal human code myself.Most of my experience using these workflows has been in the web dev domain, where there's an abundance of training data. That said, I’ve also worked in lower-level programming and language design, so I can understand why some people might not find models up to par in every scenario, particularly in niche domains.

评论 #43096200 未加载

rd3 个月前

Has anyone who evolved from a baseline of just using Cursor chat and freestyling to a proper workflow like this got any anecdata to share on noticeable improvements?Does the time invested into the planning benefit you? Have you noticed less hallucinations? Have you saved time overall?I’d be curious to hear because my current workflow is basically1. Have idea2. create-next-app + ShadCN + TailwindUI boilerplate3. Cursor Composer on agent mode with Superwispr voice transcriptionI’m gonna try the author’s workflow regardless, but would love to hear others opinions.

评论 #43096129 未加载

评论 #43095995 未加载

评论 #43097760 未加载

评论 #43095793 未加载

rollinDyno3 个月前

Something I quickly learned while retooling this past week is that it’s preferable not to add opinionated frameworks to the project as they increase the size of the context the model should be aware of. This context will also not likely be available in the training data.For example, rather than using Plasmo for its browser extension boilerplate and packaging utilities, I’ve chosen to ask the LLM to setup all of that for me as it won’t have any blindspots when tasked with debugging.

评论 #43095788 未加载

评论 #43098014 未加载

bambax3 个月前

This is all fine for a solo dev, but how does this work with a team / squad, working on the same code base?Having 7 different instances of an LLM analyzing the same code base and making suggestions would not just be economically wasteful, it would also be unpractical or even dangerous?Outside of RAG, which is a different thing, are there products that somehow "centralize" the context for a team, where all questions refer to the same codebase?

评论 #43097677 未加载

评论 #43095530 未加载

评论 #43140269 未加载

tarkin23 个月前

Most new programmers forget the specification and execution plan part of programming.I ended up finishing my side projects when I kept these in mind, rather than focusing on elegant code for elegant code's sake.It seems the key to using LLMs successfully is to make them create a specification and execution plan, through making them ask /you/ questions.If this skill--specification and execution planning--is passed onto LLMs, along with coding, then are we essentially souped-up tester-analysts?

fullstackwife3 个月前

Looks similar to my experience, except this part:> if it doesn’t work, Q&A with aider to fixI fix errors myself, because LLMs are capable of producing large chunks of really stupid/wrong code, which needs to be reverted, and thats why it makes sense to see the code at least once.Also I used to find myself in a situation when I tried to use LLM for the sake of using LLM to write code (waste of time)

codeisawesome3 个月前

Would be great if there were more details on the costs of doing this work - especially when loading lots of tokens of context via repo mix and then generating code with context (context-loaded inference API calls are more expensive, correct?). A dedicated post discussing this and related considerations would be even better. Are there cost estimations in the tools like aider (vs just refreshing the LLM platform’s billing dashboard?)

keyle3 个月前

I have been using LLM for a long time but these prompts ideas are fantastic; they really opened up a world for me.Because a lot of the benefits of LLM is bringing ideas or questions I am not thinking of right now, and this really does that. Typically this would happen as I dig through a topic, not before hand. So that's a net benefit.I also tried it and it worked a charm, the LLM did respect context and the step by step approach, poking holes in my ideas. Amazing work.I still like writing codes and solving puzzles in my mind so I won't be doing the "execution" part. From there on, I mostly use LLM as auto complete and I'm stuck here or obscure bug solving. Otherwise, I don't get any satisfaction from programming, having learnt nothing.

ggulati3 个月前

Nice, I coincidentally wrote a blog post today exploring workflows as well: <a href="https://ggulati.wordpress.com/2025/02/17/cursorai-for-frontend-dev-first-impressions/" rel="nofollow">https://ggulati.wordpress.com/2025/02/17/cursorai-for-fronte...</a>Your workflow is much more polished, will definitely try it out for my next project

评论 #43094575 未加载

评论 #43094489 未加载

Isamu3 个月前

I’m curious, is adding “do not hallucinate” to prompts effective in preventing hallucinations? The author does this.

评论 #43100386 未加载

评论 #43095732 未加载

评论 #43095345 未加载

评论 #43095292 未加载

cipehr3 个月前

Am I the only one that doesn’t see the hype with Claude? I recently tried it, hit the usage limit, read around found tons of blogs and posts from devs saying Claude is the best code assistant LLM… so I purchased Claude pro… and I hate it. I have been asking it surface level questions about Apache spark (configuring the number of tasks retries, errors, error handling, etc.) and it hallucinated so much, so frequently. It reminds me of like ChatGPT 3…What am I doing wrong or what am I missing? My experience has been so underwhelming I just don’t understand the hype for why people use Claude over something else.Sorry I know there are many models out there, and Claude is probably better than 99% of them. Can someone help me understand the value of it over o1/o3? I honestly feel like I like 4o better./frustration-rant

评论 #43097662 未加载

评论 #43097827 未加载

评论 #43098302 未加载

hooverd3 个月前

I think LLM codegen still requires a mental model of the problem domain. I wonder how many upcoming devs will simply never develop one. Calculators are tools for engineers /and/ way too many people can't even do basic receipt math.

评论 #43095829 未加载

junto3 个月前

Something I’ve started to do recently is mob programming with LLM’s.I act as the director, creativity and ideas person. I have one LLM that implements, and a second LLM that critiques and suggests improvements and alternatives.

评论 #43099299 未加载

jacooper3 个月前

I find making the LLMs think and plan the project a bit worrying, I understand this helps with procrastination but when these systems eventually get better and more integrated, the most likely thing to happen to software devs is them moving away from purely coding to more of a solution architect role (aka Planning stuff), not taking into account the negative impact of giving up critical thinking to LLMs.<a href="https://news.ycombinator.com/item?id=43057907">https://news.ycombinator.com/item?id=43057907</a>Other than that a great article! Very insightful.

评论 #43102920 未加载

avandekleut3 个月前

This is pretty much my flow that I landed on as well. Dump existing relevant files into context, explain what we are trying to achieve, and ask it to analyze various approaches, considerations, and ask clarifying questions. Once we both align on direction, I ask for a plan of all files to be created/modified in dependency order with descriptions of the required changes. Once we align on the plan I say lets proceed one file at a time, that way I can ensure each file builds on the previous one and I can adjust as needed.

randomcatuser3 个月前

> I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience. There is so much opportunity to fix this and make it amazing.This i think is the grand vision -- what could it look like?in my mind programming should look like a map -- you can go anywhere, and there'll be things happening. and multiple people.If anyone wants to work on this (or have comments, hit me up!)

blah22443 个月前

This is a great article -- I really appreciate the author giving specific examples. I have never heard of mise (<a href="https://mise.jdx.dev/" rel="nofollow">https://mise.jdx.dev/</a>) before either, and the integration with the saved prompts is a nifty idea -- excited to try it out!

评论 #43098294 未加载

krupan3 个月前

If I have to go to this much effort, what is AI buying us here? Why don't we just put the effort in to learn to write code ourselves? Instead of troubleshooting AI problems and coming up with clever workarounds for those problems, troubleshoot your code, solve those problems directly!

评论 #43102973 未加载

评论 #43099860 未加载

runoisenze3 个月前

Great write up! Roughly how many Claude tokens are you using per month with this workflow? What’s your monthly API costs?Also what do you mean by “I really want someone to solve this problem in a way that makes coding with an LLM a multiplayer game. Not a solo hacker experience.” ?

评论 #43094485 未加载

bionhoward3 个月前

I don’t mind LLMs, but what irks me is the customer noncompete, you have these systems that can do almost anything and the legal terms explicitly say you’re not allowed to use the thing for anything that competes with the thing. But if the things can do almost anything then you really can’t use it for anything. Making a game with Grok? No, that competes with the xAI game studio. Making an agents framework with ChatGPT? No, that competes with Swarm. Making legal AI with Claude? No, that competes with Claude. Seems like the only American companies making AI we can actually use for work are HuggingFace and Meta.

评论 #43095928 未加载

评论 #43096017 未加载

dfltr3 个月前

> Legacy modern codeAs opposed to Vintage Pioneer code?

评论 #43102990 未加载

mrklol3 个月前

About the first step, you probably also need some kind of context that the LLM has most information to iterate with you about the new feature idea.so either you put the whole codebase into the context (will mostly lead to problems as tokens are limited) or you have some kind of summary with your current features etc.Or you do some kind of "black box" iterations, which I feel won’t be that useful for new features, as the model should know about current features etc?What’s the way here?

maelito3 个月前

Given a 3 648 318 tokens repository (number from Repomix), I'm not sure what would be the cost of using a leading LLM to analyse it and ask improvements.Isn't the input token number way more limited than that ?This is part is unclear to me in the "non-Greenfield" part of the article.Iterating with aider on very limited scopes is easy, I've used it often. But what about understanding a whole repository and act on it ? Following imports to understand a Typescript codebase as a whole ?

评论 #43095458 未加载

thedeep_mind3 个月前

This is effing great...thanks for sharing your experience.I was just wondering how to give my edits back to in-browser tools like Claude or ChatGPT, but the idea of repo mix is great, will try!Although I have been flying bit with copilot in vscode, so right now I have essentially two AI, one for larger changes (in the browser), and then minor code fixes (in vscode).

ChrisRob3 个月前

In our company we are only allowed to use GitHub Copilot with GPT or Claude, but not Claude directly. I'm quite struggling with getting good results from it, so I'll try to adapt your workflow into that setup. To the community: Do you have some additional guidance for that setup?

评论 #43110870 未加载

psadri3 个月前

One more tool he could make is one to wrap that entire process so there is less copy/pasting needed.

评论 #43098111 未加载

pyreal3 个月前

I'm curious to see his mise tasks. He lists a few of them near the end but I'm not sure what his LLM CLI is. Is that an actual tool or is he using it as a placeholder for "insert your LLM CLI tool here"?

评论 #43097118 未加载

fallinditch3 个月前

Great post and discussion.Also, don't forget that your favorite AI tools can be of great help with the factors that cause us to make software: research, subject expertise, marketing, business planning, etc.

insin3 个月前

I liked the bit where he asked it not to hallucinate

jdenning3 个月前

Question to folks with good workflows: Are you using tools like DSPy to generate prompts? Any other tools/tips about managing prompts?

评论 #43102872 未加载

mark_mcnally_je3 个月前

I'm a bit confused here, what promt do you use to start Aider and how do you just let Aider run wild so you can play cookie clicker?

评论 #43103029 未加载

评论 #43096534 未加载

评论 #43100429 未加载

bill_lau193 个月前

I learn this things from this blog: 1. Use multi turn with LLM tools to finish a job. 2. Work step by step.

sprobertson3 个月前

Strange capitalization of atm as ATM in the HN title, but great tips in there

评论 #43094952 未加载

snowwrestler3 个月前

Spelling nit:“Over my skis” ~ in over my head.“Over my skies” ~ very far overhead. In orbit maybe?

评论 #43100441 未加载

评论 #43097068 未加载

评论 #43102895 未加载

zackify3 个月前

Cline over everything for me

oars3 个月前

Good for greenfield projects.