Get a benchmark for converting a single image. Use "strace -t" (or similar on your chosen OS) to see where the bottlenecks are occurring at each stage in the program's execution.<p>This is a linear time (O(n)) problem with a large set, so it's worth the effort to shave a few milliseconds where you can as each millisecond optimized will be multiplied 2-million-fold (about 35 minutes). Once you have an optimal configuration for single images, test a small set, then let it loose on the whole set. If you can shave off 2 seconds for a single image, that's 46 machine days right there.<p>Can you buffer the images onto a ramdisk during conversion? Guessing HDD IO will be a large bottleneck.<p>Be sure to run your single image test on different images, so you don't get false optimization positives due to various I/O caches.<p>What's the maximum number of images that ImageMagick will take in as a batch list? (Guessing it's somewhat short of 2M) Whatever it is, make sure to run as large a list as possible. There's a suggestion at <a href="http://www.imagemagick.org/Usage/files/#image_streams" rel="nofollow">http://www.imagemagick.org/Usage/files/#image_streams</a> but it re-initializes IM each time which sounds slow (still, can put the binaries on a ramdisk?)<p>You want to create a stream / "tape head" type setup where files are being processed with minimum need to re-init the conversion program. But it looks like IM6 doesn't support this so, with a sampleset of that size, you may even want to look into coding up a simple C program using libtiff/libjpeg that's sole job is to run the conversion as a stream, if you have access to such skills. It may be faster than a large general purpose tool.<p>Simple parallelism - create the list, split it into N (ImagickMaxNum) input list files, run on N workstations to reduce the total problem time by O/N. True parallelism (network queue-based) may be worth exploring using a queue system (RabbitMQ?) but don't try to write it yourself.<p>There may be situations where it makes sense to access the files via a filename mask if you can rename them (img_0 -> img_2000000) so you don't have to store and parse the file list and can use a simple increment counter.<p>Hope this helps! I'm no optimization guru and the above is very top-of-mind, but I enjoy these large problem sets. I'm also in NYC, would love to help out the NYPL and would volunteer some free time to do so (I need a useful side project). PM me if you'd like to talk further!<p>EDIT: Is this the wrong problem to tackle entirely? Can you convert the set on demand and cache-as-you-go? ie. if it's book covers that people are browsing, the first user may wait 5 seconds for an image, but that's not <i>awful</i>... Is there a particular reason to want to create precached derivatives?