In my 1980s version of Empire, all the global variables were kept contiguously in one source file. To save/restore the game, it just took the address of the first one, the address of the last one, and blitted it to a disk file, and blitted it back.<p>Very fast & easy.<p>Of course, it broke when COMDATs were introduced.<p>I did a similar thing with my text editor. The colors were configurable. The usual way was to have a configuration file, which the editor would read upon startup. But floppy disk systems were unbearably slow. So what I did was take the address of the configuration data in the data segment. I'd work backwards to where those bytes were in the EXE file, and patch the EXE file. This worked great!<p>Until the advent of virus scanners, which broke that. Virus scanners hated self-modifying EXE files.
This reminded me of the old days working in Windows 3.1 and my first professional project was to write a SOCKS client that could be loaded up and intercept all calls to Winsock's connect() function. It needed to do this without modifying the other programs and it had to happen at the DLL level and not the VxD layer where our IP stack ran.<p>Turns out there was an undocumented Windows API function along the lines of "AliasCsToDsRegister" or something like that - I've tried to find a reference to it but I can't find it. It allowed me write into the code segment (the CS was global and read only - as it was shared among all processes) and replace the first few bytes of the connect function call with a jump to my code which would then put it back, make the call to the socks server, do some other magic, put my jump hook back in and the return to the caller. Good times!<p>Kind of surprised I remember this and more so that it actually worked.
I remember doing exactly this to get code injection working on GNU/Linux systems! I made a library injection library in college for some coursework, which involved copying a C function into a code cave on a remote process and getting the remote process to execute it and return. It only works because its so bare-bones, it doesn't use try/catch and calls to other functions are possible because the function pointers are passed in through the registers and the compiled code is small enough to fit in a single page.<p>An example function I memcpy and run:<p><a href="https://github.com/skimmilk/liblibinject/blob/master/src/liblibinject/external.cpp#L92" rel="nofollow">https://github.com/skimmilk/liblibinject/blob/master/src/lib...</a>
> I pointed out to the customer liaison that what the customer is trying to do is very suspicious and looks like a virus. The customer liaison explained that it’s quite the opposite: The customer is a major anti-virus software vendor! The customer has important functionality in their product that that they have built based on this technique of remote code injection, and they cannot afford to give it up at this point.<p>As an aside, whenever I set up a Windows PC for me or a family member, the first thing I do is uninstall any third-party antivirus that may have come with the computer. I have found that anti-virus software likely makes my computer more insecure by having a big attack surface, not to mention slowing it down.
I think the even crazier thing is Windows having a function to allocate memory in another process (<a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualallocex" rel="nofollow">https://docs.microsoft.com/en-us/windows/win32/api/memoryapi...</a>). That seems like a potential source for all kinds of impossible to track bugs.<p><i>> The customer is a major anti-virus software vendor! The customer has important functionality in their product that that they have built based on this technique of remote code injection, and they cannot afford to give it up at this point.</i><p>Oh now it makes sense.
When I took a compiler class back in the early '90s, the project was to write the compiled machine code into an array, then cast the array into a function, and execute it. I and another student were doing it on 68040 NeXT workstations. One other student was doing it on a Mac, one on a VAX, and the rest on PCs (the PC students largely failed!). We were mystified why, when we tried to execute our code, it was as if it wasn't there. Took us a while to realize that the 68040 had separate instruction and data caches, and even more time (and emailing people at NeXT) to determine what the cache flush procedure was.
I wrote a "cd" replacement for cmd[1] a _long_ time ago (I only recently uploaded it to Github).<p>It uses exactly this technique to run a thread in cmd's process to actually change the directory. It's kept working from XP on up to Windows 11 now. I am always amazed it works, I fully expect it to go boom some day, probably with an error along the lines of "Don't do that, please".<p>[1] <a href="https://github.com/seligman/ccd/blob/master/RemoteThread.cpp#L416" rel="nofollow">https://github.com/seligman/ccd/blob/master/RemoteThread.cpp...</a>
I was competing in the Jump Trading programming competition and thought I had a pretty good implementation in AVX asm, but I was still behind one of their engineers, so I asked him after the competition. Turns out he was a Linux kernel committer and wrote a process to spawn multiple threads by copying itself, modifying the parameters, and then setting the offsets directly in the thread table, avoiding all mallocs and thread startup. So basically, his math code was just basic C loops, but his process was complete before my threads even finished allocation.<p>Forgive me if I got it wrong, I am definitely not a Linux kernel committer.
This reminds me of the time I wanted to run binaries compiled for SSE3 on a system that lacked SSE3. I started writing a tool to emulate this [0], and one thing it could do is rewrite the executable pages with replacement instructions if there was something that would fit (using memcpy(2), naturally).<p>This harkens back to the days when you could "download" a math coprocessor for your SX system, which was a TSR which likely did the same catching and handling of illegal instructions.<p>[0] <a href="https://github.com/rkeene/sse3-emu/blob/master/libsse3.c" rel="nofollow">https://github.com/rkeene/sse3-emu/blob/master/libsse3.c</a>
Memcpying and executing code could also surface micro-architectural realities of the underlying CPU and memory subsystem micro-architecture that may need attention from the programmer.<p>For example:<p>- On most RISCy Arm CPUs with Harvard style split instruction and data caches special architecture specific actions would need to be taken to ensure that after the memcpy any code still lingering in the data cache was cleaned/pushed out to the intended destination memory immediately (instead of at the next cache line eviction).<p>- Any stale code that happened to be cached from the destination (either by design or coincidence) needs to be invalidated in the instruction cache.<p>- Depending on the CPU micro architecture, programmer unknown speculative prefetching into caches as a result of the previous two actions may also need attention.
I've done stuff like this before; it works very well if you know the limitations, and I'd say that it even gives you a better understanding of how things actually work. Of course, don't bother MS or any other "official" vendor if it doesn't work, because you are on your own in debugging it.
The client certainly should have made sure their code was truly position independent.<p>Also, the client should have embedded their code in the executable file name so they just have to jump to the appropriate offset in argv[0]. This way, future updates just require renaming the file!
At least one additional step which is required on some architectures is you must flush the data cache and invalidate the instruction cache at the location of the new code.<p>Dynamically loading code is indistinguishable from self-modifying code, and each architecture has special steps you must take in order for it to work.
> This code is such a bad idea, I’ve intentionally introduced errors so it won’t even compile.<p>No problem, I just fixed the compiler to compile it!
I actually did something like that on Windows x86, and it worked fine. Even I was surprised by that fact :)<p>I used it to copy out a (forgotten) password from a password inputfield in another program, which you cannot read remotely (for security reasons). Worked fine for that one use-case, and I haven't used this trick it anywhere else ever again :)
Of course all of this is very much undefined behavior in standard C and C++. Some programmers really need to learn that they program the "abstract machine" when they write C or C++.
I froze for a moment seeing this article after having worked at a major anti-virus company long time back and used some low level Win32 APIs.<p>Fortunately, I followed some of the techniques from “Programming Applications for Microsoft Windows” book and Detours project to intercept and execute custom code mostly based on loading custom DLL in target remote process and using DllMain() to execute.
Yet, copy-on-write works well in Unix fork/exec() models and helps reduce memory pressure. Presumably, the kernel has a mechanism which presents as logistically simple "copy" but takes care of page/pointer/vm necessity.
Don't publicly make fun of your customers... unless they are an anti-virus company.<p>What boggles my mind is how they went on to ask MS for help fixing their obviously wrong vulnerability-and-crash-introducing software.