Huh. This is a very... "interesting" application for an LLM. I'm not the brightest crayon in the box, but if anyone else would like to follow along with my non-expert opinion as I read through the paper, here's my take on it.<p>It's pretty important for compilers / decompilers to be reliable and accurate -- compilers behaving in a deterministic and predictable way is an important fundamental of pipelines.<p>LLMs are inherently unpredictable, and so using an LLM for compilation / decompilation -- even an LLM that has 99.99% accuracy -- feels a bit odd to include as a piece in my build pipeline.<p>That said, let's look at the paper and see what they did.<p>They essentially started with CodeLlama, and then went further to train the model on three tasks -- one primary, and two downstream.<p>The first task is compilation: given input code and a set of compiler flags, can we predict the output assembly? Given the inability to verify correctness without using a traditional compiler, this feels like it's of limited use on its own. However, training a model on this as a primary task enables a couple of downstream tasks. Namely:<p>The second task (and first downstream task) is compiler flag prediction / optimization to predict / optimize for smaller assembly sizes. It's a bit disappointing that they only seem to be able to optimize for assembly size (and not execution speed), but it's not without its uses. Because the output of this task (compiler flags) are then passed to a deterministic function (a traditional compiler), then the instability of the LLM is mitigated.<p>The third task (second downstream task) is decompilation. This is not the first time that LLMs have been trained to do better decompilation -- however, because of the pretraining that they did on the primary task, they feel that this provides some advantages over previous approaches. Sadly, they only compare LLM Compiler to Code Llama and GPT-4 Turbo, and not against any other LLMs fine-tuned for the decompilation task, so it's difficult to see in context how much better their approach is.<p>Regarding the verifiability of the disassembly approach, the authors note that there are issues regarding correctness. So the authors employ round-tripping -- recompiling the decompiled code (using the same compiler flags) to verify correctness / exact-match. This still puts accuracy in the 45% or so (if I understand their output numbers), so it's not entirely trustworthy yet, but it might be able to still be useful (especially if used alongside a traditional decompiler, and this model's outputs only used when they are verifiably correct).<p>Overall I'm happy to see this model be released as it seems like an interesting use-case. I may need to read more, but at first blush I'm not immediately excited by the possibilities that this unlocks. Most of all, I would like to see it explored if these methods could be extended to optimize for performance -- not just size of assembly.
I continue to be fascinated about what the next qualitative iteration of models will be, marrying the language processing and broad knowledge of LLMs with an ability to reason rigorously.<p>If I understand correctly, this work (or the most obvious productionized version of it) is similar to the work Deep Mind released a while back: the LLM is essentially used for “intuition”—-to pick the approach—-and then you hand off to something mechanical/rigorous.<p>I think we’re going to see a huge growth in that type of system. I still think it’s kind of weird and cool that our meat brains with spreading activation can (with some amount of effort/concentration) switch over into math mode and manipulate symbols and inferences rigorously.
Some previous work in the space is at <a href="https://github.com/albertan017/LLM4Decompile">https://github.com/albertan017/LLM4Decompile</a>
As usual, Twitter is impressed by this, but I'm very skeptical, the chance of it breaking your program is pretty high. The thing that makes optimizations so hard to make is that they have to match the behavior without optimizations (unless you have UBs), which is something that LLMs probably will struggle with since they can't exactly understand the code and execution tree.
Unlike many other AI-themed papers at Meta this one omits any mention of the model output getting used at Instagram, Facebook or Meta. Research is great! But doesn't seem all that actionable today.
I am curious about CUDA assembly, does it work on CUDA -> ptx level? or ptx -> sass? I have done some work on SASS optimization and it would be a lot easier if LLM could be applied at SASS level
Reading the title, I thought this was a tool for optimizing and disassembling LLMs, not an LLM designed to optimize and disassemble. Seeing it's just a model is a little disappointing in comparison.
my knowledge of compilers don't extend beyond a 101 course done ages ago, but i wonder how the researchers enriched the dataset for improving these features.<p>did they just happen to find a way to format the heuristics of major compilers in half-code, half-language mix? confusingly enough, another use case where a (potential) tool that let us veer into the solution with some work is being replaced by an llm.
I don’t understand the purpose of this. Feels like a task for function calling and sending it to an actual compiler.<p>Is there an obvious use case I’m missing?