A messy experiment that changed how I think about AI code analysis

463 点作者 namanyayg4 个月前

44 条评论

People are being very uncharitable in the comments for some reasonThis is a short and sweet article about a very cool real-world result in a very new area of tooling possibilities, with some honest and reasonable thoughtsMaybe the "Senior vs Junior Developer" narrative is a little stretched, but the substance of the article is greatCan't help but wonder if people are getting mad because they feel threatened

评论 #42604425 未加载

评论 #42604988 未加载

评论 #42604215 未加载

评论 #42604081 未加载

评论 #42603697 未加载

评论 #42606021 未加载

afro884 个月前

Another cherry-picked example of an LLM doing something amazing, written about with a heavy dose of anthropomorphism.It's easy to get LLMs to do seemingly amazing things. It's incredibly hard to build something where it does this amazing thing consistently and accurately for all reasonable inputs.> Analyzing authentication system files:> - Core token validation logic> - Session management> - Related middlewareThis hard coded string is doing some very heavy lifting. This isn't anything special until this string is also generated accurately and consistently for any reasonable PR.OP if you are reading, the first thing you should do is get a variety of codebases with a variety of real world PRs and set up some evals. This isn't special until evals show it producing consistent results.

评论 #42603069 未加载

评论 #42603742 未加载

评论 #42602962 未加载

评论 #42603419 未加载

评论 #42604001 未加载

评论 #42604968 未加载

评论 #42604064 未加载

评论 #42603733 未加载

评论 #42603333 未加载

jerpint4 个月前

I think there will be lessons learned here as well for better agentic systems writing code more generally; instead of “committing” to code as of the first token generated, first generate overall structure of code base, with abstractions, and only then start writing code.I usually instruct Claude/chatGPT/etc not to generate any code until I tell it to, as they are eager to do so and often box themselves in a corner early on

评论 #42601995 未加载

评论 #42602078 未加载

评论 #42602217 未加载

评论 #42602194 未加载

评论 #42602540 未加载

voidhorse4 个月前

To me, this post really just highlights how important the human element will remain. Without achieving the same level of contextual understanding of the code base, I have no clue as to whether or not the AI warning makes any sense.At a superficial level, I have no idea what "shared patterns" means or why it logically follows that sharing them would cause a race condition. It also starts out talking about authentication changes, but then cites a PR that modified "retry logic"—without that shared context, it's not clear to me that an auth change has anything to do with retry logic unless the retry is related to retries on authentication failures.

advael4 个月前

I read to like the first line under the first bold heading and immediately this person seemed like an alien. I'll go back and read the rest because it's silly to be put off a whole article by this kind of thing, but what in the actual fuck?I was probably not alive the last time anyone would have learned that you should read existing code in some kind of linear order, let alone programming. Is that seriously what the author did as a junior, or is it a weirdly stilted way to make an analogy to sequential information being passed into an LLM... which also seems to misunderstand the mechanism of attention if I'm honestI swear like 90% of people who write about "junior developers" have a mental model of them that just makes zero sense that they've constructed out of a need to dunk on a made up guy to make their point

评论 #42602640 未加载

评论 #42602239 未加载

评论 #42602571 未加载

评论 #42603236 未加载

评论 #42602591 未加载

评论 #42602530 未加载

评论 #42603488 未加载

评论 #42602633 未加载

评论 #42603821 未加载

评论 #42602570 未加载

评论 #42604137 未加载

评论 #42606083 未加载

评论 #42602735 未加载

评论 #42602315 未加载

Arch-TK4 个月前

I am struggling to teach AI to stop dreaming up APIs which don't exist and failing to solve relatively simple but not often written about problems.It's good when it works, it's crap when it doesn't, for me it mostly doesn't work. I think AI working is a good indicator of when you're writing code which has been written by lots of other people before.

评论 #42604826 未加载

评论 #42606273 未加载

JoeAltmaier4 个月前

Pretty impressive. But for the part about nitpicking on style and uniformity (at the end) the results seem useful.Btw I thought, from the title, this would be about an AI taught to dismiss anyone's work but their own, blithely hold forth on code they had no experience with, and misinterpret goals and results to fit their preconceived notions. You know, to read code like a Senior Developer.

评论 #42602119 未加载

ianbutler4 个月前

Code context and understanding is very important for improving the quality of LLM generated code, it’s why the core of our coding agent product Bismuth (which I won’t link but if you’re so inclined check my profile) is built around a custom code search engine that we’ve also built.We segment the project into logical areas based on what the user is asking, then find interesting symbol information and use it to search call chain information which we’ve constructed at project import.This gives the LLM way better starting context and we then provide it tools to move around the codebase through normal methods you or I would use like go_to_def.We’ve analyzed a lot of competitor products and very few have done anything other than a rudimentary project skeleton like Aider or just directly feeding opened code as context which breaks down very quickly on large code projects.We’re very happy with the level of quality we see from our implementation and it’s something that really feels overlooked sometimes by various products in this space.Realistically, the only other product I know of approaching this correctly with any degree of search sophistication is Cody from SourceGraph which yeah, makes sense.

charles_f4 个月前

I wondered if there was a reason behind the ligature between c and t across the article (e.g. is it easier to read for people with dyslexia).If like me you didn't know, apparently this is mostly stylistic, and comes from a historical practice that predates printing. There are other common ligatures such as CT, st, sp and th. <a href="https://rwt.io/typography-tips/opentype-part-2-leg-ligatures" rel="nofollow">https://rwt.io/typography-tips/opentype-part-2-leg-ligatures</a>

jalopy4 个月前

This looks very interesting, however it seems to me like the critical piece of this technique is missing from the post: the implementations of getFileContext() and shouldStartNewGroup().Am I the one missing something here?

评论 #42602106 未加载

评论 #42602065 未加载

评论 #42602632 未加载

theginger4 个月前

What is with the font joining the character c and t on this site?(In headings)

评论 #42602574 未加载

评论 #42602185 未加载

评论 #42604910 未加载

评论 #42602155 未加载

OzzyB4 个月前

So it turns out that AI is just like another function, inputs and outputs, and the better you design your input (prompt) the better the output (intelligence), got it.

评论 #42602378 未加载

评论 #42604030 未加载

评论 #42602385 未加载

crazygringo4 个月前

I'm fascinated by stories like these, because I think it shows that LLM's have only shown a small amount of their potential so far.In a way, we've solved the raw "intelligence" part -- the next token prediction. (At least in certain domains like text.)But now we have to figure out how to structure that raw intelligence into actual useful thinking patterns. How to take a problem, analyze it, figure out ways of breaking it down, try those ways until you run into roadblocks, then start figuring out some solution ideas, thinking about them more to see if they stand up to scrutiny, etc.I think there's going to be a lot of really interesting work around that in the next few years. A kind of "engineering of practical thinking". This blog post is a great example of one first step.

评论 #42604923 未加载

dartos4 个月前

I think the content is interesting, but anthropomorphizing AI always rubs me the wrong way and ends up sounding like marketing.Are you trying to market a product?

e12e4 个月前

> Related PR: #1234 (merged last week) modified the same retry logic. Consider adding backoff."Is this an example of confabulation (hallucination)? It's difficult to tell from the post.

评论 #42611690 未加载

rurban4 个月前

I had a similar experience today with Claude. I asked it to come up with a tournament schedule for the upcoming city table-tennis tournament I want to organize, to maybe limit the attendees number or get more tables.The AI came up with totally unusable python class to print the schedule, where I just wanted to know the results,and tradeoffs. Then I reformulated the question for a senior, and it came with the highlevel answer I expected and needed.Ask junior questions, and you get junior code. Ask senior questions and you get the senior answers.

disambiguation4 个月前

OP you only took this half way. We already know LLMs can say smart sounding things while also being wrong and irrelevant. You need to manually validate how many N / 100 LLM outputs are both correct and significant - and how much did it miss! Otherwise you might fall into a trap of dealing with too much noise for only a little bit of signal. The next step from there is comparing it with human level signal to noise ratio.

stevenhuang4 个月前

Related article on how LLMs are force fed information line by line<a href="https://amistrongeryet.substack.com/p/unhobbling-llms-with-knowledge-in-the-world" rel="nofollow">https://amistrongeryet.substack.com/p/unhobbling-llms-with-k...</a>> Our entire world – the way we present information in scientific papers, the way we organize the workplace, website layouts, software design – is optimized to support human cognition. There will be some movement in the direction of making the world more accessible to AI. But the big leverage will be in making AI more able to interact with the world as it exists.> We need to interpret LLM accomplishments to date in light of the fact that they have been laboring under a handicap. This helps explain the famously “jagged” nature of AI capabilities: it’s not surprising that LLMs struggle with tasks, such as ARC puzzles, that don’t fit well with a linear thought process. In any case, we will probably find ways of removing this handicap

qianli_cs4 个月前

I thought it was about LLM training but it’s actually prompt engineering.

评论 #42603020 未加载

dimtion4 个月前

Without knowing exactly how createNewGroup and addFileToGroup are implemented it is hard to tell, but it looks like the code snippet has a bug where the last group created is never pushed to groups variable.I'm surprised this "senior developer AI reviewer" did not caught this bug...

cloudking4 个月前

Sounds like OP hasn't tried the AI IDEs mentioned in the article.For example, Cursor Agent mode does this out of the box. It literally looks for context before applying features, changes, fixes etc. It will even build, test and deploy your code - fixing any issues it finds along the way.

评论 #42603367 未加载

ptx4 个月前

Well, did you check if the AI's claims were correct?Does PR 1234 actually exist? Did it actually modify the retry logic? Does the token refresh logic actually share patterns with the notification service? Was the notification service added last month? Does it use websockets?

SunlitCat4 个月前

Oh my. That title alone inspired me to ask ChatGPT to read a simple hello world cpp program like a drunken sailor.The end result was quite hilarious I have to say.It's final verdict was:End result? It’s a program yellin’, "HELLO WORLD!" Like me at the pub after 3 rum shots. Cheers, matey! hiccup:D

评论 #42603012 未加载

shahzaibmushtaq4 个月前

All fresh bootcamp grads aren't going to understand what the author is talking about, and many senior developers (even mid-seniors) are looking for what prompts the author wrote to teach AI how to become a senior developer.

risyachka4 个月前

>> The AI went from: >> "This file contains authentication logic using JWT tokens"So what was the initial prompt? "What's in this file?"And then you added context and it became context-aware. A bit of an overstatement to call this "Holy Shit moment"Also , why is "we"? What is "our AI"? And what is "our benchmark script"?And how big is your codebase? 50k files? 20 files?This post has very very little value without a ton of details, looks like nowadays everything "ai" labeled gets to the front page.

评论 #42602941 未加载

atemerev4 个月前

This is what Aider doing out of the box

redleggedfrog4 个月前

That's funny those are considered Senior Dev attributes. I would think you'd better be doing that basic kind of stuff from the minute your writing code for production and future maintenance. Otherwise your making a mess someone else is going to have to clean up.

danjl4 个月前

> Identifying tech debt before it happensTech debt is a management problem, not a coding problem. A statement like this undermines my confidence in the story being told, because it indicates the lack of experience of the author.

评论 #42603372 未加载

patrickhogan14 个月前

This is great. More context is better. Only question is after you have the AI your code why would you have to tell it basic things like this is session middleware.

zbyforgotp4 个月前

Personally I would not hardcode the discovery process in code but just gave the llm tools to browse the code and find what it needs itself.

yapyap4 个月前

haha man, some of yall really talk about AI like it’s some baby with all the knowledge in the world, waiting to be taught common sense

Workaccount24 个月前

Just like training data, the more context and the higher quality the context you give the model, the better the outputs become.

revskill4 个月前

The seniors master more than 2 or 3 languages.

kmoser4 个月前

I wonder how the results would compare to simply prompting it to "analyze this as if you were a senior engineer"?

评论 #42603572 未加载

deadbabe4 个月前

This strikes me as basically doing the understanding for the LLM and then having it summarize it.

quantadev4 个月前

In my Coding Agent, I ended up realizing my prompts need to be able to specifically mention very specific areas in the code, for which no real good syntax exists for doing that so I invented something I call "Named Blocks".My coding agent allows you to put any number of named blocks in your code and then mention those in your prompts by name, and the AI understands what code you mean. Here's an example:In my code:<pre><code> -- block_begin SQL_Scripts ...some sql scripts... -- block_end </code></pre> Example prompt:<pre><code> Do you see any bugs in block(SQL_Script)?</code></pre>

评论 #42604143 未加载

_0ffh4 个月前

Very sceptical of "Context First: We front-load system understanding before diving into code". The LLM sees the whole input at once, it's a transformer, not a recurrent model. Order shouldn't matter in that sense.Ed. I see some people are disagreeing. I wish they explained how they imagine that would work.

whinvik4 个月前

Sounds interesting. Do you have documentation on how you built the whole system?

评论 #42602665 未加载

mbrumlow4 个月前

> Context First: We front-load system understanding before diving into code Pattern Matching: Group similar files to spot repeated approaches Impact Analysis: Consider changes in relation to the whole systemWait. You fixed your AI by doing traditional programming !?!?!

评论 #42604578 未加载

guerrilla4 个月前

Today I learned I have "senior dev level awareness". This seems pretty basic to me, but impressive that the LLM was able to do it. On the other hand, this borderline reads like those people with their "AI" girlfriends.

riazrizvi4 个月前

Nice article. The comments are weird as fuck.

highcountess4 个月前

Dev palms just got that much more sweaty.

Jimmc4144 个月前

@namanyayg Thanks for posting this, OP. I created a prompt series based on this and so far I like the results. Here is the repo if you are interested.<a href="https://github.com/jimmc414/better_code_analysis_prompts_for_AI">https://github.com/jimmc414/better_code_analysis_prompts_for...</a>I used this tool to flatten the example repo and PRs into text:<a href="https://github.com/jimmc414/1filellm">https://github.com/jimmc414/1filellm</a>

scinadier4 个月前

A bit of a disappointing read. The author never elaborates on the details of the particular day in which they taught AI to read code like a Senior Developer.What did they have for lunch? We'll never know.

评论 #42602737 未加载