Not used routinely, but it is possible to shadow the BASIC ROM on the Amstrad CPC 464 (1984). That ROM is normally located at C000h, shared with bank-switched RAM storing the display at the same address. You could copy the ROM to the RAM and permanently switch in the RAM. (You had to do another trick to move the display RAM elsewhere too, but it was all possible by writing to the correct gate array registers[1])<p>Which allowed you to do stuff like modifying the error messages in BASIC to say rude words, not that we would have ever done that.<p>[1] <a href="https://www.cpcwiki.eu/imgs/f/f6/S968se02.pdf" rel="nofollow">https://www.cpcwiki.eu/imgs/f/f6/S968se02.pdf</a>
Can someone explain why ROM was slower? I would have guessed it was faster.<p>Is there some technical limitation? Or was it more of a “can can shadow the ROM so save the extra few cents on faster ROM” thing? Just that no one was pushing as hard as for faster RAM.
Let's suppose your PC system ROM was 64 KiB F000:0000 to F000:FFFF (F0000 to FFFFF linear). If it wasn't a 286 or higher, then it couldn't simply use protected<->real-mode virtual memory tricks to copy ROMs and remap memory pages. Instead, there would need to be a motherboard-assisted hardware mechanism to redirect the decoding of the top 4 bits matching 1111 from ROM to copy to and steal some RAM. It would probably be easiest to do this at boot (during POST) by temporarily mapping future shadow ROM (RAM) to RAM, copy the ROM->RAM, and then remap that RAM over the ROM. Let's borrow A000:0000 since we're in text mode 0x3 using 4 KiB of the EGA/VGA frame buffer at B800:0000 and assume there's no other adapter configured to use this area.<p><pre><code> ; copy 64 KiB from F000:0000 -> A000:0000
; registered not saved: AX CX SI DI
; registered saved: CS DS ES FLAGS
; 12 bytes of stack required
PUSH DS
PUSH ES
PUSHF
CLD ; technically, this isn't needed
; we're copying all 64 KiB and
; could wrap backwards
MOV AX, F000h
XOR SI, SI
MOV DS, AX
MOV AX, A000h
XOR DI, DI
MOV ES, AX
MOV CX, 8000h ; 64 KiB / (sizeof(word) == 2)
REP MOVSW ; DS:[SI] -> ES:[DI]
; 16 bits at a time until CX == 0
POPF
POP ES
POP DS
</code></pre>
segment:offset addressing in real-mode<p>linear address = segment * 16 + offset<p>With the exception of some memory areas are mapped only by hardware decoding the segment but not the offset to where reads beyond offset FFF0 wrap around instead of pointing to adjacent memory.<p>AC00:FFFF (BBFFFh linearly) != B000:BFFF<p>There are variations of real mode on 286+ that use linear flat addressing outside of protected mode (it would be 16 MiB for 286 and 386SX, and 4 GiB max for 386+) by messing with the hidden masks of the segment registers by switching to protected mode temporarily. The downside is all of the system provided (ROM) real-mode interrupt handlers, OS, and drivers would need to be rewritten for this different addressing scheme. It's called unreal mode. Hypothetically, you could algorithmically transpile ROM code by parsing instructions and rewriting them. Real-mode interrupt handlers are stored in a table of 256 far pointers at 0000:0000 to 0000:03FFF.
While discussing RAM, can someone clear up another minor conundrum for me: How does a current x86 chipset map RAM modules to linear adresses? They can differ in size, so the mapping is probably dynamic? But I suppose an adder causes latency? Or is everything just mapped with big holes between modules, with CPU paging fixing up the mess?