AI Meets WinDBG

290 点作者 thunderbong9 天前

14 条评论

You should check out ChatDBG project - which AFAICT goes much further than this work, though in a different direction, and which, among other things, lets the LLM drive the debugging process - has been out since early 2023. We initially did a WinDBG integration but have since focused on lldb/gdb and pdb (the Python debugger), especially for Python notebooks. In particular, for native code, it integrates a language server to let the LLM easily find declarations and references to variables, for example. We spent considerable time developing an API that enabled the LLM to make the best use of the debugger’s capabilities. (It also is not limited to post mortem debugging). ChatDBG’s been out since 2023, though it has of course evolved since that time. Code is here [1] with some videos; it’s been downloaded north of 80K times to date. Our technical paper [2] will be presented at FSE (top software engineering conference) in June. Our evaluation shows that ChatDBG is on its own able to resolve many issues, and that with some slight nudging from humans, it is even more effective.[1] <a href="https://github.com/plasma-umass/ChatDBG">https://github.com/plasma-umass/ChatDBG</a> (north of 75K downloads to date) [2] <a href="https://arxiv.org/abs/2403.16354" rel="nofollow">https://arxiv.org/abs/2403.16354</a>

评论 #43895641 未加载

评论 #43897437 未加载

评论 #43911873 未加载

lowleveldesign9 天前

I do a lot of Windows troubleshooting and still thinking about incorporating AI in my work. The posted project looks interesting and it's impressive how fast it was created. Since it's using MCP it should be possible to bind it with local models. I wonder how performant and effective it would be. When working in the debugger, you should be careful with what you send to the external servers (for example, Copilot). Process memory may contain unencrypted passwords, usernames, domain configuration, IP addresses, etc. Also, I don’t think that vibe-debugging will work without knowing what eax registry is or how to navigate stack/heap. It will solve some obvious problems, such as most exceptions, but for anything more demanding (bugs in application logic, race conditions, etc.), you will still need to get your hands dirty.I am actually more interested in improving the debugger interface. For example, AI assistant could help me create breakpoint commands that nicely print function parameters when you only partly know the function signature and do not have symbols. I used Claude/Gemini for such tasks and they were pretty good at it.As a side note, I recall Kevin Gosse also implemented a WinDbg extension [1][2] which used OpenAI API to interpret the debugger command output.[1] <a href="https://x.com/KooKiz/status/1641565024765214720" rel="nofollow">https://x.com/KooKiz/status/1641565024765214720</a>[2] <a href="https://github.com/kevingosse/windbg-extensions">https://github.com/kevingosse/windbg-extensions</a>

anougaret9 天前

this is pretty cool but ultimately it won't be enough to debug real bugs that are nested deep within business logic or happening because of long chains of events across multiple services/layers of the stackimo what AI needs to debug is either:- train with RL to use breakpoints + debugger or to do print debugging, but that'll suck because chains of action are super freaking long and also we know how it goes with AI memory currently, it's not great- a sort of omniscient debugger always on that can inform the AI of all that the program/services did (sentry-like observability but on steroids). And then the AI would just search within that and find the root causenone of the two approaches are going to be easy to make happen but imo if we all spend 10+ hours every week debugging that's worth the shotthat's why currently I'm working on approach 2. I made a time travel debugger/observability engine for JS/Python and I'm currently working on plugging it into AI context the most efficiently possible so it debugs even super long sequences of actions in dev & prod hopefully one dayit's super WIP and not self-hostable yet but if you want to check it out: <a href="https://ariana.dev/" rel="nofollow">https://ariana.dev/</a>

评论 #43892784 未加载

评论 #43894984 未加载

评论 #43892715 未加载

评论 #43893353 未加载

danielovichdk9 天前

Claiming to use WinDBG for debugging a crash dump and the only commands I can find in the MCP code are these ? I am not trying to be a dick here, but how does this really work under the covers ? Is the MCP learning windbg ? Is there a model that knows windbg ? I am asking becuase I have no idea.<pre><code> results["info"] = session.send_command(".lastevent") results["exception"] = session.send_command("!analyze -v") results["modules"] = session.send_command("lm") results["threads"] = session.send_command("~") </code></pre> You cannot debug a crash dump only with these 4 commands, all the time.

评论 #43892535 未加载

评论 #43892540 未加载

评论 #43893734 未加载

评论 #43894550 未加载

JanneVee9 天前

> Crash dump analysis has traditionally been one of the most technically demanding and least enjoyable parts of software development.I for one enjoy crashdump analysis because it is a technically demanding rare skill. I know I'm an exception but I enjoy actually learning the stuff so I can deterministically produce the desired result! I even apply it to other parts of the job, like learning to currently used programming language and actually reading the documentation libraries/frameworks, instead of copy pasting solutions from the "shortcut du jour" like stack overflow yesterday and LLMs of today!

评论 #43893600 未加载

the_duke9 天前

I feel like current top models (Gemini Pro 2.5 etc) would already be good developers if they had the feedback cycle and capabilities that real developers have:* reading the whole source code* looking up dependency documentation and code, search related blog posts* getting compilation/linter warnings ands errors* Running tests* Running the application and validating output (eg, for a webserver, start the server, send requests, get the response)The tooling is slowly catching up, and you can enable a bunch of this already with MCP servers, but we are nowhere near the optimum yet.Expect significant improvements in the near future, even if the models don't get better.

评论 #43894292 未加载

评论 #43894170 未加载

JanSchu9 天前

This is one of the most exciting and practical applications of AI tooling I've seen in a long time. Crash dump analysis has always felt like the kind of task that time forgot—vital, intricate, and utterly user-hostile. Your approach bridges a massive usability gap with the exact right philosophy: augment, don't replace.A few things that stand out:The use of MCP to connect CDB with Copilot is genius. Too often, AI tooling is skin-deep—just a chat overlay that guesses at output. You've gone much deeper by wiring actual tool invocations to AI cognition. This feels like the future of all expert tooling.You nailed the problem framing. It’s not about eliminating expertise—it’s about letting the expert focus on analysis instead of syntax and byte-counting. Having AI interpret crash dumps is like going from raw SQL to a BI dashboard—with the option to drop down if needed.Releasing it open-source is a huge move. You just laid the groundwork for a whole new ecosystem. I wouldn’t be surprised if this becomes a standard debug layer for large codebases, much like Sentry or Crashlytics became for telemetry.If Microsoft is smart, they should be building this into VS proper—or at least hiring you to do it.Curious: have you thought about extending this beyond crash dumps? I could imagine similar integrations for static analysis, exploit triage, or even live kernel debugging with conversational AI support.Amazing work. Bookmarked, starred, and vibed.

评论 #43897597 未加载

评论 #43893973 未加载

评论 #43900852 未加载

lgiordano_notte9 天前

Curious how you're handling multi-step flows or follow-ups, seems like thats where MCP could really shine especially compared to brittle CLI scripts. We've seen similar wins with browser agents once structured actions and context are in place.

cadamsdotcom9 天前

Author built an MCP server for windbg: <a href="https://github.com/svnscha/mcp-windbg">https://github.com/svnscha/mcp-windbg</a>Knows plenty of arcane commands in addition to the common ones, which is really cool & lets it do amazing things for you, the user.To the author: most of your audience knows what MCP is, may I suggest adding a tl;dr to help people quickly understand what you've done?

codepathfinder9 天前

Built this around 2023 mid and found interesting results!

Tepix9 天前

Sounds really neat!How does it compare to using the Ghidra MCP server?

评论 #43896916 未加载

评论 #43892868 未加载

评论 #43895729 未加载

alexvitkov9 天前

Watching a guy type at 30 WPM in a chatbox reminds me of those old YouTube tutorials where some dude is typing into into a notepad window, and showing you how to make a shortcut to "shutdown -s -t 0" on your school computer and give it the Internet Explorer icon. It's only missing Linkin Park blasting in the background.If you're debugging from a crash dump you probably have a large, real world program, that actual people have reviewed, deemed correct and released in the wild.Current LLMs can't produce a sane program over 500 lines, the idea that they can understand a correct looking program several orders of magnitude larger, well enough to diagnose and fix a subtle issue that the people who wrote the it missed, is absurd.

评论 #43896171 未加载

indigodaddy9 天前

My word, that's one of the most beautiful sites I've ever encountered on mobile.

评论 #43898050 未加载

评论 #43892244 未加载

Zebfross9 天前

Considering AI is trained on the average human experience, I have a hard time believing it would be able to make any significant difference in this area. The best experience I’ve had debugging at this level was using Microsoft’s time travel debugger which allows stepping forward and back.

评论 #43892884 未加载

评论 #43893136 未加载

评论 #43895931 未加载

评论 #43892164 未加载

评论 #43892279 未加载

评论 #43892178 未加载