I need to reverse a binary made years ago, and I have zero experience with cpp, so I think it would be a good experiment to get an LLM to help me in any way
Binary Ninja has an AI integration called side kick, it has a free trial but I'm not sure it can be used in the free web version. [1]<p>In my experience, the off the shelf LLMs (e.g. ChatGPT) do a pretty poor job with assembly, they can not reason about the stack or stack frames well.<p>I think your job will be the same with or without AI. Figuring out the data structures and data types a function is operating on and naming variables.<p>What are you reverse engineering for? For example, getting a full compilable decompilation has different goals than finding vulnerabilities or patching a bug.<p>1. <a href="https://sidekick.binary.ninja/" rel="nofollow">https://sidekick.binary.ninja/</a>
These guys are building foundational models for this purpose: <a href="https://reveng.ai/" rel="nofollow">https://reveng.ai/</a>. The results are quite compelling, and they have plugins for your favourite reverse engineering tools.
I made a site to use LLMs to help me with reverse engineering. The output is surprisingly readable, even with C++ classes. Let me know any feedback you might have: <a href="https://decompiler.zeroday.engineering/" rel="nofollow">https://decompiler.zeroday.engineering/</a>
Do you have experience reverse engineering? If not, LLMs are not going to help much. LLMs are useful for aiding the analysis but they don’t do the analysis.
Interesting. Wouldn't this actually be a deterministic problem based on graph analysis. Id have thought LLMs would have been more effective taking the out out some graph recognizer and then identifying what those higher level constructs map to?
The LLM4Decompile project (<a href="https://github.com/albertan017/LLM4Decompile">https://github.com/albertan017/LLM4Decompile</a>) provides some open models for binary to C decompilation and Ghidra pseudocode refinement, along with some training sets.<p>RevEng.ai, linked a few times already, discusses their approach here: <a href="https://blog.reveng.ai/training-an-llm-to-decompile-assembly-code/" rel="nofollow">https://blog.reveng.ai/training-an-llm-to-decompile-assembly...</a>
I like using it for library function comments, variable name recovery, and sometimes types. The comments are usually hit or miss, but I find the variable names to be a bit better than auto-generated ones. I implement most of this in my decompiler plugin: <a href="https://github.com/mahaloz/DAILA;">https://github.com/mahaloz/DAILA;</a> check it out if you are interested :).
The Advent of Cyber side quest this year needed some Ghidra and I found Pickman's Model was pretty good at helping me craft a heap exploit from a decompilation.
I've only played a with this, but it was impressive.<p><a href="https://ghidra-sre.org/" rel="nofollow">https://ghidra-sre.org/</a>
Inspired by the work out there that reverse engineers game engines, I've always wanted to try my hand at reverse engineering to contribute to the world of game preservation.<p>Is it actually legal to decompile a game engine from executables/dll files, write new sources by making sense of the output and rewriting it such that it can be compiled targeting modern APIs?<p>I feel like that must be illegal
You could use the LLM to help you write utility scripts for whatever disassembler you’re using e.g. python for IDA. That might work better than feeding it raw assembly.<p>Game RE communities also have all sorts of neat utilities for decompiling large cpp binaries. Skyrim’s community is pretty active with ghidra/ida.<p>Guessing you’re not lucky enough to have a PDB?
Do you know the compiler and what the source possibly looks like? I found LLMs are pretty good at recovering code from binaries, they need help though.<p>If you are able to run the program and collect traces, that will help a ton.
cpp? that's a preprocessor. u mean c++?<p>LLM won't help you much if u can't understand what it's talking about.<p>Manual way is, given ELF (linux executable format) somexe,<p>$ strings somexe<p>$ objdump -d somexe<p>$ objdump -s -j .ro data somexe<p>then look+ponder over the results.<p>and/or running ghidra (as mouse'd UI) over it.. which may help somewhat but not 100%<p>Have in mind, that objdump and ghidra have opposite ways of showing assembly transfer/multi-operand instructions - one has <i>mov dest,target</i> , other has <i>mov target,dest</i> - for same code.<p>no idea on (recent) windoze front. IDA ?
Highly recommend it. I reversed an app with o1 Pro Mode and the analysis of the obfuscated C# code matched up accurately with what I eventually discovered by manually reversing.