Did Semgrep Just Get a Lot More Interesting?

190 pointsby ghuntley3 months ago

27 comments

How are people collaborating on code when using AI tools to generate patches?We hold code review dear as a tool to make sure more than one set of eyeballs has been over a change before it goes into production, and more than one person has the context behind the code to be able to fix it in future.As model generated code becomes the norm I’m seeing code from junior engineers that they haven’t read and possible nor do they understand. For example, one Python script calling another using exec instead of importing it as a module, or writing code that is already available as a very common part of the standard library.In such cases, are we asking people to mark their code as auto generated? Should we review their prompts instead of the code? Should we require the prompt to code step be deterministic? Should we see their entire prompt context and not just the prompt they used to build the finished patch?I feel like a lot of the value of code review is to bring junior engineers up to higher levels. To that extent each review feels like an end of week school test, and I’m getting handed plagiarised AI slop to mark instead of something that maps properly to what the student does or does not know.Pair programming is another great teaching tool. Soon, it might be the only one left.

评论 #43057668 未加载

评论 #43057747 未加载

评论 #43059912 未加载

评论 #43057379 未加载

评论 #43060758 未加载

scottlamb3 months ago

> But I just checked and, unsurprisingly, 4o seems to do reasonably well at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?I don't know about semgrep syntax, but the chat it generated is bad in at least a couple other ways. E.g. their "how to fix" instruction is wrong:<pre><code> if let Some(Load::Local(load)) = self.load.read().get(...) { // do a bunch of stuff with `load` } else { drop(self.load.read()); // Explicitly drop before taking write lock let mut w = self.load.write(); self.init_for(&w); } </code></pre> That actually acquires and then drops a second read lock. It doesn't solve the problem that the first read lock is still active and thus the write lock will deadlock.Speaking of which, acquiring two read locks from the same thread can also deadlock, as shown in the "Potential deadlock example" at <<a href="https://doc.rust-lang.org/std/sync/struct.RwLock.html" rel="nofollow">https://doc.rust-lang.org/std/sync/struct.RwLock.html</a>>. It can happen in the code above (one line before the other deadlock). It can also slip through their rule because they're incorrectly looking for just a write lock in the else block.I've been playing with AI code generation tools like everyone else, and they are okay as autocomplete, but I don't see them as trustworthy. For a while I thought I just wasn't prompting well enough, but when other people show me their AI output, I can see it's wrong, so maybe I'm just looking more closely?

mcqueenjordan3 months ago

> But I just checked and, unsurprisingly, 4o seems to do reasonably well at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?This is the thing with LLMs. When you’re not an expert, the output always looks incredible.It’s similar to the fluency paradox — if you’re not native in a language, anyone you hear speak it at a higher level than yourself appears to be fluent to you. Even if for example they’re actually just a beginner.The problem with LLMs is that they’re very good at appearing to speak “a language” at a higher level than you, even if they totally aren’t.

评论 #43060290 未加载

simonw3 months ago

DSLs like Semgrep are one of my top use-cases for LLMs generally.It used to be that tools like Semgrep and jq and Tree Sitter and zsh all required you to learn quite a bit of syntax before you could start using them productively.Thanks to LLMs you can focus on learning what they can do for you without also having to learn the fiddly syntax.

评论 #43057881 未加载

评论 #43058223 未加载

评论 #43055882 未加载

评论 #43056813 未加载

评论 #43057980 未加载

eitland3 months ago

I am reminded if this IMO timeless classic:<a href="https://news.ycombinator.com/item?id=5397797">https://news.ycombinator.com/item?id=5397797</a>A short snippet (the whole thing is very funny and interestingly written in 2013 long before the modern ai craze):By now I had started moving on to doing my own consulting work, but I never disabled the hill-climbing algorithm. I'd closed and forgotten about the Amazon account, had no idea what the password to the free vps was anymore, and simply appreciated the free money.But there was a time bomb. That hill climbing algorithm would fudge variables left and right. To avoid local maxima, it would sometimes try something very different.One day it decided to stop paying me.Its reviews did not suffer. It's balance increased. So it said, great change, let's keep it. It now has over $28,000 of my money, is not answering my mail, and we have been locked in an equity battle over the past 18 months.The worst part is that I still have to clean up all its answers to protect our reputation. Who's running who anyway?

miki1232113 months ago

I think an even more interesting use case for semgrep, and also LSP or something like LSP, is querying for exactly what an AI needs to know to fix a specific problem.Unlike humans, LLMs have no memory, so they can't just learn where things are in the code by remembering the work they did in the past. In a way, they need to re-learn the relevant parts of your codebase from scratch on every change, always keeping context window limitations in mind.Humans learn by scrolling and clicking around and remembering what's important to them; LLMs can't do that. We try to give them autogenerated codebase maps and tools that can inject specific files into the context window, but that doesn't seem to be nearly enough. Semantic queries look like a much better idea.I thought you couldn't really teach an LLM how to use something like that effectively, as that's not how humans work and there's no data to train on, but the recent breakthroughs with RL made me change my mind.

评论 #43059159 未加载

kubb3 months ago

OK, hear me out. The future isn’t o4 or whatever. The future is when everyone, every language, every tool, every single library and codebase can train their own custom model tailored to their needs and acting as a smart documentation which you can tell what you want to do and it will tell you how to do it.People have been trying with fine tuning, RAG, using the context window. That’s not enough. The model needs to be trained on countless examples of question-answer for this particular area of knowledge starting from a base model aware of comp sci concepts and language (just English is fine). This implies that such examples have to be created by humans - each such community will need its own „Stack Overflow”.Smaller, specialized models are the future of productivity. But of course that can’t be monetized, right? Well, the technology just needs to get cheaper so that people can just afford to train such models themselves. That’s the next major breakthrough. Could be anyway.

neom3 months ago

Love the illustrator. And love linking out and supporting her.

评论 #43055638 未加载

评论 #43055631 未加载

spamfilter2473 months ago

I’ve built something for a solution that takes you most of the way there, using Semgrep’s SARIF output and prompted LLMs to help prioritize triage.We’ve used this for the past year at Microsoft to help prioritize the “most likely interesting” 5% of a large set of results for human triage. It works quite well…<a href="https://github.com/247arjun/ai-secure-code-review">https://github.com/247arjun/ai-secure-code-review</a>

ksec3 months ago

LOL but i cant help but think about the comment from tptacek [1].>"We wrote all sorts of stuff this week and this is what gets to the front page. :P"And how they write content specifically for HN [2].[1] <a href="https://news.ycombinator.com/item?id=43053985">https://news.ycombinator.com/item?id=43053985</a>[2] <a href="https://fly.io/blog/a-blog-if-kept/">https://fly.io/blog/a-blog-if-kept/</a>

mmsc3 months ago

I've been trying to do something similar to create CodeQL queries recently, and found that chatgpt is completely unable to create even simple queries. I assume it's because training is based on old query language or just completely missing, but being able to feed the rules and the errors which they produce when run has been a complete failure for me.

antirez3 months ago

Take a large context frontier model. Upload 200k tokens of code for each query. Ask about what code pattern you want it to highlight for you. Works better than any other system, but costs token on API services.

zamalek3 months ago

So the idea is that LLM1 looks at the output of LLM0 and builds a new set of constraints, and then LLM0 has to try again, rinse and repeat? (LLM0 could be the same as LLM1, and I think it is in the article?)

waynenilsen3 months ago

That's Devin / replit agentNot there yet but it is inevitable

j453 months ago

I think the author is missing one part about cursor, aider, etc.Out of the box it is decent.Watching only the basic optimizations on YouTube developers are doing proper to starting a project puts the experience and consistency to a far higher levelMaybe this casual surface testing if I’m not Mia reading is why so many tech people are missing what tools like cursor, aider, etc are doing.

xg153 months ago

> What interests me is this: it seems obvious that we’re going to do more and more “closed-loop” LLM agent code generation stuff. By “closed loop”, I mean that the thingy that generates code is going to get to run the code and watch what happens when it’s interacted with.Well, at least we have a credible pathway into the Terminator or Matrix universes now...

jasonjmcghee3 months ago

I'm quite surprised "autofix" functionality wasn't mentioned.<a href="https://semgrep.dev/docs/writing-rules/autofix" rel="nofollow">https://semgrep.dev/docs/writing-rules/autofix</a>Seems like the natural thing to do for cases that support it.

awinter-py3 months ago

'closed loop' concept in here is importantthe point that a unit of code is a thing that is maintained, rather than a thing that is generated once, is where codegen has always lost me(both AI codegen and ruby-on-rails boilerplate generators)iterative improvement, including factoring useful things out to standard libraries, is where it's at

hamilyon23 months ago

I just tried my latest task with it and o1 readily hallucinated non-existent semgrep functions.

0x696C69613 months ago

I wrote a tool for rewriting semgrep matches using an LLM <a href="https://github.com/icholy/semgrepx">https://github.com/icholy/semgrepx</a>

skirge3 months ago

"Generate patterns for language X and framework Y which can lead to vulnerability V, generate Semgrep/Joern rule for it" longest chats with ChatGPT.

bhouston3 months ago

I have a closed loop coding agent working here, you can try it out: <a href="https://mycoder.ai" rel="nofollow">https://mycoder.ai</a>

ignoramous3 months ago

r2c / semgrep has truly come a long way since its incubation at Facebook: <a href="https://github.com/facebookarchive/pfff">https://github.com/facebookarchive/pfff</a>Remember using soot, kythe.io, & pfff to find the exact CTS (compatibility test suite) tests to run given code diff between two AOSP builds.

technion3 months ago

I have to ask if this semgreo rule for relock bugs is public, because the first google hit for me is this blog.

sho_hn3 months ago

> But I’m burying the lead.It's "lede". There's a few other typos too.I'm not sure I like the "This one trick they don't want you to know about!" writing style of these (e.g. the Cursor/malpractice hot take, that sort of thing).

评论 #43055521 未加载

评论 #43055535 未加载

timewizard3 months ago

> makes me think that more of the future of our field belongs to people who figure out how to use this weird bags of model weights than any of us are comfortable with.Until you find a way to improve self guided training, no, this will never happen. New things get invented and need to be implemented before your "bag of weights" has any idea how to approach it, which is, of course, by simply stealing something that already existed.People who think this way blow my mind. Is it that you don't actually like your day job and dream about having a machine do it for you while, somehow, still earning the salary you currently command?Laughable.

fizx3 months ago

I'd put a $1k long bet that a 3B param model, cleverly orchestrated, will achieve AGI* in the next ten years. These are the sorts of ideas that would help get us there.Any takers?*AGI defined as smarter than a FAANG staff engineer on similar tasks.

评论 #43055412 未加载

27 comments

gorgoiler3 months ago

评论 #43057668 未加载

评论 #43057747 未加载

评论 #43059912 未加载

评论 #43057379 未加载

评论 #43060758 未加载

scottlamb3 months ago

mcqueenjordan3 months ago

> But I just checked and, unsurprisingly, 4o seems to do reasonably well at generating Semgrep rules? Like: I have no idea if this rule is actually any good. But it looks like a Semgrep rule?This is the thing with LLMs. When you’re not an expert, the output always looks incredible.It’s similar to the fluency paradox — if you’re not native in a language, anyone you hear speak it at a higher level than yourself appears to be fluent to you. Even if for example they’re actually just a beginner.The problem with LLMs is that they’re very good at appearing to speak “a language” at a higher level than you, even if they totally aren’t.

评论 #43060290 未加载

simonw3 months ago

评论 #43057881 未加载

评论 #43058223 未加载

评论 #43055882 未加载

评论 #43056813 未加载

评论 #43057980 未加载

eitland3 months ago

miki1232113 months ago

评论 #43059159 未加载

kubb3 months ago

neom3 months ago

Love the illustrator. And love linking out and supporting her.

评论 #43055638 未加载

评论 #43055631 未加载

spamfilter2473 months ago

ksec3 months ago

mmsc3 months ago

antirez3 months ago

zamalek3 months ago

waynenilsen3 months ago

That's Devin / replit agentNot there yet but it is inevitable

j453 months ago

xg153 months ago

jasonjmcghee3 months ago

awinter-py3 months ago

hamilyon23 months ago

I just tried my latest task with it and o1 readily hallucinated non-existent semgrep functions.

0x696C69613 months ago

I wrote a tool for rewriting semgrep matches using an LLM <a href="https://github.com/icholy/semgrepx">https://github.com/icholy/semgrepx</a>

skirge3 months ago

"Generate patterns for language X and framework Y which can lead to vulnerability V, generate Semgrep/Joern rule for it" longest chats with ChatGPT.

bhouston3 months ago

I have a closed loop coding agent working here, you can try it out: <a href="https://mycoder.ai" rel="nofollow">https://mycoder.ai</a>

ignoramous3 months ago

technion3 months ago

I have to ask if this semgreo rule for relock bugs is public, because the first google hit for me is this blog.