"Twice the speed of the GBC" is a bit misleading.<p>Clock rate of the ARM7TDMI is indeed around double the GBC (GBA runs at 16.78Mz, while GBC runs at 8.4MHz), but cycles-per-instruction is far lower on the GBA's ARM7TDMI than the GBC's Z80-like processor.<p>On GBA, most instructions take 1 cycle to execute (when running from fast memory). Not all instructions take one cycle, memory Read/Write instructions, branches, and multiplying takes more than one cycle.<p>On GBC, an instruction basically takes 4 cycles per memory access. This includes the instruction fetch itself, each other byte of the instruction, each memory read/write performed, then 4 additional cycles if the instruction performed 16-bit math. (Also stuff for branches too)<p>But GBA doesn't always run code from fast memory. It gets the worst-case performance when executing code directly from the cartridge. When running 16-bit THUMB code, it takes 5 cycles. When running 32-bit ARM code, it takes 8 cycles. This means that a game needs to copy code into fast memory if it wants to run at a high performance.<p>So with the full penalties that come from directly executing code from the cartridge, and you're comparing the simplest instructions, it does end up being only twice as fast. But when running code from fast memory, it's around 16 times faster.