Idea:<p>If any assembler/disassembler author/team out there wants to produce an assembler/disassembler which is authoritative (difficult to do on x86, because there are so many different possible combinations of instruction encoding, <a href="https://github.com/xoreaxeaxeax/sandsifter" rel="nofollow">https://github.com/xoreaxeaxeax/sandsifter</a> : "Typically, several million undocumented instructions on your processor will be found, but these generally fall into a small number of different groups.") -- then what they'd do is to create a third program -- which "pits" the output of Assembler A vs. Assembler B, Disassembler A vs. Disassembler B...<p>That is, between any two assemblers (for the same CPU architecture/instruction set), or any two disassemblers (again, for the same instruction set), <i>where are the anomalies</i>?<p>If we think about an assembler as a simple function, y=f(x), that is, I give it a string of ascii bytes as input (x), and I get a string (1..n) binary bytes as output (y),
and a disassembler as the reverse function, then how difficult would it be to write a program which imported the assembly functionality of two (or more) assemblers, and then just started comparing the outputs?<p>Well, there's a slight problem there, which is, that you'd have to create a series of strings representing valid assembler instructions first...<p>But, why not let the disassembler(s) do that!<p>So our future program for "pitting" assembler vs. assembler, disassembler vs. disassembler looks like this:<p>1) Start with a single byte, 00.<p>2) Pass that byte to the disassembler. Is it a valid instruction?<p>3) If so, pass the string passed back to an assembler, and get the result of that.<p>4) Is the resulting binary byte (or byte string) the same as the one we started with? If so, all OK. If not, log the anomaly to a log!<p>5) Increment single byte by 1, perform above instructions in a loop until after we hit 255, then start with a 2-byte string, and same thing (like an odometer). Keep doing this until we've expanded to the max allowable for a x86, which I believe is 15 bytes in length (Note: That's one BIG number(!) -- could we perform this loop in one lifetime? I don't know... perhaps if it took too long we could intelligently skip some combinations like Christopher Domas does in Sandsifter)...<p>But anyway, that would be the algorithm for "pitting" Assembler vs. Assembler (or perhaps more specifically Assembler vs. Disassembler (you get the general idea of what's being said here!)) -- and comparing the results!<p>Think of it as a 'diff' tool -- but for the output of assemblers/disassemblers -- as opposed to files...<p>Why?<p>Well, because x86 is complex, to say the least!<p>And well, because it's more likely than not, that any given Assembler/Disassembler -- contains bugs and/or errors, even though they might not be intentional!<p>Anyway, if no one else does it... I'll do it in the future (too busy with other things right now!)... so "Note to future self" <g>.<p>But the entire program would amount to a few loops...<p>Actually, that's another good point... Any assembler/disasembler program worth its salt -- should provide its libraries with Python (or other easily scriptable language) bindings... many do; some don't; just a random related observation...