Last time I did APL programming I was a bit bothered by the performance implications of some of the standard loopless programming styles. In particular, it's hard to nest loops.<p>As an example of the implications, consider computing the Mandelbrot set. I'll be using Numpy here to ensure people can follow what I'm doing, but for the point I wish to make, it's similar to how you'd write it in APL. The Mandelbrot set is compute by applying a function like this to each of a bunch of complex numbers:<p><pre><code> def divergence(c, d):
i = 0
z = c
while i < d and dot(z) < 4.0:
z = c + z * z
i = i + 1
return i
</code></pre>
To apply this to many points simultaneously in a vectorised "loopless" style, we'd write it like this:<p><pre><code> def mandelbrot_numpy(c, d):
output = np.zeros(c.shape)
z = np.zeros(c.shape, np.complex32)
for it in range(d):
notdone =
np.less(z.real*z.real + z.imag*z.imag,
4.0)
output[notdone] = it
z[notdone] = z[notdone]**2 + c[notdone]
return output
</code></pre>
There is just one `for` loop, which is pretty easy to do in APL. The `while` loop has been subsumed into control flow encoded in boolean arrays. This is not <i>exactly</i> how you'd do in APL, but it has a similar feel. It's also pretty slow, because we are manifesting the entire `z` array in memory for every iteration in the outer loop. In contrast, an old school loop over every point, with an inner while loop for every point, would involve only two memory accesses per point. On a GPU, I have measured the vectorised style to be about 30x slower than one with a conventional `while` loop.