What is hand-coded assembly language used for these days?<p>To put that another way, in the current marketplace, what kinds of program are so worthy of optimization that it's economically sensible to have a human spend several days hand-tuning machine language to squeeze out every CPU cycle?
IMVU hand-rolled its SSE skinning loops and parts of the software 3D lighting code, because only 2/3 of our customers have GPUs. We need to run well on five-year-old Dells with Intel graphics. (Direct3D on Intel isn't as good as a dedicated software renderer. We chose RAD's Pixomatic.)<p>In addition, look at how popular netbooks are becoming. The Intel Atom is an <i>in-order</i> CPU. Imagine a hyperthreaded, 1.6 GHz 486...<p>On the iPhone it's even worse. It's got a decent vector unit, but the CPU is very slow. You'll see great wins by doing your 3D math yourself.<p>As we continue to become multicore, I could imagine somebody shaving a couple cycles out of the core message passing routines, though you're almost certainly bus bound in those situations...<p>Computers are getting smaller and people want more out of them; assembly language is back in style!
I did some assembly optimization for an internal RTL-level simulator. We had ~1000 machines on a three year upgrade cycle, i.e., we upgraded 333 machines / year = $333k / year. Lets say I cost the company $200k / year. Several days = perhaps $2k, so I'd only need to get a .6% speedup for it to be worth it, not even including the cost of powering and maintaining our machines.<p>When I worked on it, our simulator was an order of magnitude faster than commercially available simulators (Synopsis VCS and Cadence NC-Verilog), which cost between $1k and $10k per license per year. I worked for a tiny hardware startup; established hardware companies use a few orders of magnitude more compute power than we did, so the equation is probably at least four orders of magnitude further in favor of doing assembly optimization in a commercial simulator.
Anything that's worth spending time to do fast is worth spending time writing SIMD assembly for.<p>You can get 5x, 10x, 20x, or more performance increases just by using the vector instructions given to you by the CPU. Until a magic compiler appears that can make proper use of them (read: never), hand-coded assembly will be critical for almost any application for which performance is critical, especially multimedia processing.
Signal processing algorithms on the phones made by a certain company I worked at are mostly written in assembly. The cellular protocols, at least those that use time-division (e.g. GSM), have strict real-time constraints, but mostly they use assembly because every microsecond you can shave off those algorithms is a microsecond you can sleep and conserve power.
Joshua Block, Chief Java Architect at Google, says in Coders at Work:<p>"But for the absolute core of the system—the inner loops of the index servers, for instance—very small gains in performance are worth an awful lot. When you have that many machines running the same piece of code, if you can make it even a few percent faster, then you’ve done something that has real benefits, financially and environmentally. So there is some code that you want to write in assembly language."
Debugging and reverse engineering games.<p>When publishers/developers don't give a bleep, the fans take up the task of fixing the bugs themselves. I happen to run one such project in my spare time (for C&C: Red Alert 2), and it's amazing how much stuff is broken. It's not as "serious" as other projects mentioned here, but still a reason to know ASM. (And a good way to see bad programming practices in action :) )
Going the other way around - can anyone think of an open source project that <i>could</i> benefit from some assembly optimization, and isn't? I'd love an excuse to play with this stuff in a useful fashion.<p>(I love what I do, but my twelve year old self would be disgusted that I'm not writing games.)
Medical Imaging and Oil Exploration. A lot of the really fast packages are using ARB Assembly instead of GLSL to minimize the number of instructions per voxel. It adds up if you are doing 4D imaging in real time for instance.
In addition to the optimization reasons, you also end up coding assembly by hand to tickle features in the verification and bringup of new processors and/or processor architectures.<p>Since a lot of the bugs therein may be dependent on a certain sequence of instructions, doing it in a high level language doesn't make any sense.
Microcontroller firmware. There are many examples of AVR code in assembly on the web. I learned assembly this way. It really makes sense when you're working on bare hardware with no abstraction layers in the way. Also, it's useful for time-critical applications such as creating video signals or audio processing.
GPGPU stuff -- that is, using your graphics processor for random programming tasks. While something like <a href="<a href="http://www.nvidia.com/object/cuda_home.html>CUDA</a>" rel="nofollow">http://www.nvidia.com/object/cuda_home.html>CUDA</a&...</a>; reduces the need to write assembly-like code, it also reduces the available speed substantially.<p>For that matter, CUDA (and ATI's Bare-Metal Interface, which is similar) is more assembly-like than C-like in many ways. So even using the higher-level available language is still pretty much like assembly.<p>You tend to only write these things when you're going to be running a <i>lot</i> of elements through, so almost everything you do in these platforms is inner-loop, or you'd be using a different tool. So even small speed-ups tend to matter.
I work in virtual machine development, so a portion of the interface code for hardware virtualization I wrote in straight ASM. This is not (exactly) for speed reasons, though; it's just impossible to touch the hardware at that level in C. :)
Compiler intrinsics, binary patches and hooks (although EasyHook has made assembly a rarity here outside of the occasional shim where odd calling conventions are used), in-process debuggers, low-level bootloaders, hardware initialization/management, various thunking mechanisms.<p>Others have covered the optimization side of things well so I won't repeat it, but there are tiny fragments of assembly all over the place -- they hold your system together.
I've done it for cryptography code and cryptanalysis code. Specifically, optimizing code to take advantage of specific instructions available in certain processors or to make use of vector registers and instructions. I wrote my programs in C and then went back and wrote assembly for parts of the code that could deliver a significant overall speedup with hand optimization.<p>One place I did this was various RSA Challenge attack clients.
Thanks to all for the many informed and detailed replies!<p>I am now assistant-teaching a college course in low-level computer programming. It's an excellent course: the students reprogram a children's toy robot that uses the ARM processor. <a href="http://www.amazon.com/Little-Tikes-Giggles-Remote-Control/dp/B000096QMU" rel="nofollow">http://www.amazon.com/Little-Tikes-Giggles-Remote-Control/dp...</a> They're getting up to speed very quickly on how to get hardware to actually do stuff.<p>Yes, I actually left Silicon Valley to do grad school. I haven't given up the principle of "do real stuff, see real results", though. I'm looking to design a couple fairly small homework assignments consisting of optimizing some ARM code. I want the examples to be real. Now mulling over which to do...
Lots of time small embedded programs, especially on underpowered micro's, see this sort of attention.<p>Additionally, low level hardware interfacing is often done with hand coded assembly, because it is easier to "get right" on some crappy compiler toolchains that you face, then C.
I've re-written many perl things in C to speed up processing time - for me, it's better than buying more/newer hardware. Also, for smallish scripts that I invoke millions of times or a small perl script that does regexps, I can rewrite those in C to boost speed as well. I don't do any ASM code anymore, but C is a really good optimization step for me and my projects.
Not a marketplace use, but the most recent use I've had for Assembly was in an Atari 2600 programming class.<p><a href="http://nirmalpatel.com/hacks/atari.html" rel="nofollow">http://nirmalpatel.com/hacks/atari.html</a>
Some weirdos just plain like it more than high-level languages. One of those weirdos is developing the Linoleum language: <a href="http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpWesA4&r=lYUhcug3l3hhX4s6lfk5" rel="nofollow">http://anywherebb.com/bb/index.php?l=D4JeGEdhacS6Srr6QLQgZpW...</a>