Hello! I'm the creator of the Java version of Spice86, and I just saw this thread. Let me address a few topics:<p>Why Java / C#?<p>I initially chose Java because it's a decently fast language I am comfortable working with, and it has a lot of tooling available to investigate performance issues, debug easily, and provide basic, easy access to multiplatform sound and graphics.<p>However, we eventually migrated to C# for several reasons:
- Control Structures: Re-implementing Cryo Dune DOS assembly code in Java proved challenging, especially when mapping jumps to high-level control structures. Just adding a goto and taking care of it later was easier.
- Unsigned Integers: Java's lack of support for unsigned integers was a source of bugs.
- Similarities: C# and Java have similar syntaxes so migration was not too crazy, toolings are equally good, and performances are comparable.<p>Why Not Just Use Ghidra / IDA?<p>I wish we could, but there are many reasons why this isn't straightforward:
- Ghidra Support for 16-bit x86 real mode isn't great, with some bugs requiring significant investment to fix. For example, this issue: <a href="https://github.com/NationalSecurityAgency/ghidra/issues/981">https://github.com/NationalSecurityAgency/ghidra/issues/981</a>. I guess no one is willing to invest in that because there is no market.
- IDA is not free and hex-rays doesn't support decompiling DOS code.<p>Additionally, code from that era often involved hand-crafted assembly mixed with C, compiled by forgotten tools. As a result, tools like Ghidra and IDA struggle with static analysis due to practices like self-modifying code, jumping in the middle of instructions, and editing the function call stack to redirect return statements.<p>One simple yet concrete example is the switch statement. When a switch statement is written, one implementation method a compiler might use is creating a table in memory with the addresses of all the case statements. The assembly code will then compute the address in that table based on the condition and jump to the appropriate case.
When decompiling such code, if the compiler version is known and supported, you can infer the location of the jump table and reconstruct the addresses of all the code in the switch statement.
However, if the code is handwritten assembly, you would need to debug it at execution time to find where the rest of the code is, as it is not statically reachable.
Very concretely, if you see something like JMP AX, you need to debug to see where is the rest of the code.<p>How Do We Get Work Done despite all that?<p>Spice86 provides two main strategies:
- Override assembly with high-level code, that way you can rewrite your game bit by bit, testing as you go.
- Execution flow recording and code generation. It records execution flow and tells you what was executed. We have a Ghidra plugin to import that data and generate code from it. For example, see <a href="https://github.com/OpenRakis/LOGO/blob/main/GeneratedCode_OriginalAsm.cs">https://github.com/OpenRakis/LOGO/blob/main/GeneratedCode_Or...</a>. Interestingly, since it's now C#, you can decompile it again to extract control structures, as shown here <a href="https://github.com/OpenRakis/LOGO/blob/main/GeneratedCode_DecompiledAsm.cs">https://github.com/OpenRakis/LOGO/blob/main/GeneratedCode_De...</a>.<p>Issue with code generation is that ghidra does not like real mode code so the plugin is full of workarounds (and broken at the moment), and there are some fundamental things that can't be done. For instance, self modifying code support is murky with the ghidra plugin.<p>Future Improvements<p>We're working on eliminating Ghidra by generating code directly from Spice86 from the recorded execution trace.
Our goal is to fully support self-modifying code and generate functional code from an execution trace with minimal human intervention.<p>I thought I would make a quick reply, but it turns out there's a lot to say on the topic :)