Design of GNU Parallel (2015)

170 pointsby Havelockabout 2 years ago

15 comments

ketzuabout 2 years ago

This was quite interesting to look through!Perl 5.8.0 is over 20 years old (<a href="https://dev.perl.org/perl5/news/2002/07/18/580ann/" rel="nofollow">https://dev.perl.org/perl5/news/2002/07/18/580ann/</a>) while centOS 3.9 was released in 2007! At the same time it seems not-that-old and ancient.My personal anecdote with gnu parallel was running into it while working in academia. It worked well and saved me some time, but I felt that it was unreasonable of a tool to ask for a citation to parallelise a script - it seemed that matplotlib, jupyter and co would need one as well. On the other hand, I decided to not use it, because I also feel that authors can ask for whatever they want.

评论 #35219038 未加载

评论 #35224597 未加载

a2800276about 2 years ago

Wait what: `parallel` is a Perl script!? [1]I would have thought it's black magic with assembler optimisations for MIPS and special considerations for HP-UX...This is such a lovely and interesting writeup, it's wonderful that people take their time to share so generously![1] : an 11k loc petal script, you can read along here: <a href="https://github.com/gitGNU/gnu_parallel/blob/master/src/parallel">https://github.com/gitGNU/gnu_parallel/blob/master/src/paral...</a>

评论 #35218943 未加载

NortySpockabout 2 years ago

I found GNU parallel useful when I wanted to queue up transcoding of flac files to mp3 on my Raspberry Pi. A few ffmpeg flags plus a list of files meant I could easily just saturate one job per core with a one-line bash command.

评论 #35219546 未加载

评论 #35216840 未加载

cricalixabout 2 years ago

parallel is a tool I've reached for many times; the citation bit it prints is odd - it seems to assume that the general use case is research/academic - but easily squelched.A sample use case would be having a file that has words in it, one per line, and you want to run a program that operates on each word (device name, dollar amount, whatever). Sure, you can use a loop, but if the words and actions are independent, parallel is one way to spin up N copies of your program and pass it a single word from the file. Can get around Python's GIL without having to use multiprocessing or threads (as a more concrete example).Didn't realise that it busy waits, but I'm typically running it on a not very busy server with tens of cores.

评论 #35216858 未加载

BooneJSabout 2 years ago

Before GNU Parallel I used to use Ruby's workers and job queue to keep ${N} cores busy with work. It sorta worked like GNU parallel but was quite basic. I've since switched to using GNU Parallel. Stable code I don't have to write doesn't have to be maintained... not to mention it has more features than I normally supported.

评论 #35222279 未加载

docandrewabout 2 years ago

I couldn’t make heads or tails of what this would be useful for from the OP (maybe it’s something I should already have known), but this from the official site was pretty helpful: <a href="https://www.gnu.org/software/parallel/parallel_cheat.pdf" rel="nofollow">https://www.gnu.org/software/parallel/parallel_cheat.pdf</a>

评论 #35216868 未加载

mianosabout 2 years ago

I once replaced a 10 machine Hadoop cluster job with a python script and parallel on my laptop because I didn't want to wait for hours for it to finish.The i7 on my laptop with quite a few CPUS/threads and a few optimisations got the job finished in 10 minutes.(I later put the Hadoop use on my resume, not the GNU parallel. That's the joke of modern job hunting. There is no interested in what you did, just buzzwords and leetcode. Luckily there are still a few places that value real work or I'd be too old to get a job. :) )

ZoomZoomZoomabout 2 years ago

If anyone needs a pretty basic alternative with Windows support, there's Rush:<a href="https://github.com/shenwei356/rush">https://github.com/shenwei356/rush</a>I use it pretty extensively with ffmpeg, imagemagick and the like.I'd been using the mmstick/parallel for a while, but it moved to RedoxOS repos and then stopped being updated, while still having some issues not ironed out.<a href="https://github.com/shenwei356/rush">https://github.com/shenwei356/rush</a>

seizedabout 2 years ago

Parallel is a fun tool. I use it as a sort of simple slurm to distribute work over many VMs to process tens to hundreds of TBs of data. Sometimes across 2400+ cores.

michalcabout 2 years ago

I've never been sure if it's too much of a hack, but I've used GNU parallel in Docker containers as a quick and easy way of getting multiple processes running for web applications.And with the `--halt now,done=1` option (that I think is relatively recent?) it means that if any of the parallel processes exit, parallel would exit itself, the whole container will shut down, and external orchestration would start another one if needed.

评论 #35218508 未加载

评论 #35217206 未加载

imglorpabout 2 years ago

Don't forget "make -j" is another option.

评论 #35220163 未加载

评论 #35221517 未加载

anthkabout 2 years ago

Parallel, vidir to edit directories with nvi/vim, moreutils, detox to scrap out any non-typeable char...These are a must have today.

评论 #35218194 未加载

rurbanabout 2 years ago

I wrote down a small usage example here: <a href="https://savannah.gnu.org/forum/forum.php?forum_id=9197" rel="nofollow">https://savannah.gnu.org/forum/forum.php?forum_id=9197</a>No need for massive distributed clusters when you have a simple perl oneliner

rockwotjabout 2 years ago

I recently used parallel to write a 1TB data file for testing using all cores<pre><code> seq 0 10000 | parallel dd if=/dev/urandom of=/mnt/foo/input bs=10M count=10 seek={}0</code></pre>

评论 #35221353 未加载

globalresetabout 2 years ago

What's the best rewrite of GNU Parallel in Rust? That citation thing is so annoying.