TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

30 Years of Decompilation and the Unsolved Structuring Problem: Part 1

83 pointsby g0xA52A2Aover 1 year ago

4 comments

albertzeyerover 1 year ago
A funny anecdote: Some time ago, I was writing a C-to-Python translator.<p>(Why? Just for fun, <a href="https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCParser">https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCParser</a>, even more just-for-fun goal was this: <a href="https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCPython">https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCPython</a>).<p>It literally would translate the C code in equivalent Python code, using ctypes heavily. It was mostly straight-forward, except for mapping goto (thus related to this control flow structuring problem).<p>Of course, there are some hacks to introduce goto in Python, which in many cases would operate on the Python bytecode, which actually has the JUMP_ABSOLUTE op, but there are also other ways (<a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;6959360&#x2F;goto-in-python" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;questions&#x2F;6959360&#x2F;goto-in-python</a>).<p>I could also have translated C directly to equivalent Python bytecode and not Python source code, but I really wanted to have Python source code.<p>My ugly solution worked basically like this: Whenever there was some goto in a function, it would translate it as follows:<p>First, we flatten any Python AST into a series of statements, with even more goto&#x27;s added for while&#x2F;for loops, if&#x27;s, etc. (We can avoid doing this for all sub-ASTs where there are no no goto-labels. The goal is to have all goto-labels at the top-level, not inside a sub-AST.)<p>Only conditional goto&#x27;s can stay. All Gotos and Goto-labels are marked somehow as special elements. So, we end up with sth like:<p><pre><code> x() y() &lt;goto-label &quot;a&quot;&gt; z() &lt;goto-stmnt &quot;a&quot;&gt; w() if v(): &lt;goto-stmnt &quot;a&quot;&gt; q() </code></pre> Now, we can implement the goto-handling based on this flattened code:<p>- Add a big endless loop around it. After the final statement, a break would leave the loop.<p>- Before the loop, we add the statement `goto = None`.<p>- The goto-labels will split the code into multiple part, where we add some `if goto is None:` before each part (excluding the goto-labels).<p>- For the goto-labels itself, we add this code:<p><pre><code> if goto == &lt;goto-label&gt;: goto = None </code></pre> - For every goto-statement, we add this code:<p><pre><code> goto = &lt;goto-stmnt&gt;; continue </code></pre> See here: <a href="https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCParser&#x2F;blob&#x2F;master&#x2F;goto.py">https:&#x2F;&#x2F;github.com&#x2F;albertz&#x2F;PyCParser&#x2F;blob&#x2F;master&#x2F;goto.py</a>
评论 #38861242 未加载
评论 #38861960 未加载
HALtheWiseover 1 year ago
I&#x27;ve wondered whether it would be possible to design a language with the explicit intent of being a decompilation target, such that you could guarantee that any program could be decompiled into it and then subsequently re-compiled with the original behavior preserved. Having such a language (perhaps a superset of C?) would make it way easier to perform binary patching.
评论 #38863745 未加载
评论 #38879061 未加载
评论 #38865452 未加载
评论 #38863725 未加载
Philpaxover 1 year ago
This is great! I’m always on the lookout for advancements in decompilation. Looking forward to Part 2 :)
评论 #38864345 未加载
torstenvlover 1 year ago
This is super interesting, and I&#x27;ve definitely wondered how control flow &quot;graph schemas&quot; are applied in decompilation.<p>[EDIT: Removed a comment about an apparent OCR issue, which has been fixed. Thanks for this great write-up!]
评论 #38861304 未加载