You have to be a bit careful with the CLFLUSH method. I tried to use it in a widely used program years ago because Intel recommended it, but we found that it just hangs the CPU on some older VIA/Centaur CPUs. Presumably that's fixed these days, but the old CPUs are likely still around.
Looks fun, but impractical. Are there real uses for this kind of thing, on modern architectures?<p>Related question: has anyone tried to create a high-level language for doing this kind of madness?