TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GNU Parallel, where have you been all my life?

448 点作者 alexpls超过 1 年前

41 条评论

BoppreH超过 1 年前
It&#x27;s a nice tool, but it also shows the shortcomings of shell commands.<p>In a proper programming language, we&#x27;d have something like<p><pre><code> parallel [1..5], i =&gt; { sleep random()*10+5; possibly_flaky i } &#x2F;&#x2F; [{&quot;Seq&quot;: 4, &quot;Host&quot;: &quot;:&quot;, &quot;Starttime&quot;: 1692491267... </code></pre> And `parallel` would only have to worry about parallelization.<p>Instead, the shell environment forces programs to invent their own parameter separator (:::), a templating format ({1}), and a way to output a list of structures (CSV-like). You can see the same issues in `find`, where the exec separator is `\;`, the template is `{}`, and the output is delimited by \n or \0. And `xargs` does it in yet another different way.<p>It&#x27;s very hard to acquire and retain mastery over a toolbox where every tool reinvents the basics. If you ever found yourself searching &quot;find exec syntax&quot; multiple times in a week, it&#x27;s not your fault.<p>As for alternatives, I&#x27;m a fan of YSH[1] (Javascript-like), Nushell[2] (reinvented from first-principles for simplicity and safety) and Fish[3] (bash-like but without the footguns). Nushell is probably my favorite from the bunch, here&#x27;s a parallel example:<p><pre><code> ls | where type == dir | par-each { |it| { name: $it.name, len: (ls $it.name | length) } } </code></pre> [1] <a href="https:&#x2F;&#x2F;www.oilshell.org&#x2F;release&#x2F;latest&#x2F;doc&#x2F;ysh-tour.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.oilshell.org&#x2F;release&#x2F;latest&#x2F;doc&#x2F;ysh-tour.html</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;nushell&#x2F;nushell">https:&#x2F;&#x2F;github.com&#x2F;nushell&#x2F;nushell</a><p>[3] <a href="https:&#x2F;&#x2F;fishshell.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;fishshell.com&#x2F;</a>
评论 #37211235 未加载
评论 #37210036 未加载
评论 #37210817 未加载
评论 #37219592 未加载
评论 #37225078 未加载
zackmorris超过 1 年前
Since nobody asked, I&#x27;m reiterating my position that computers to effectively utilize parallel functionality simply aren&#x27;t available today. I&#x27;ve always wanted a computer with at least 256 cores and local content-addressable memories beside each core to send data where it&#x27;s needed. By Moore&#x27;s Law, we could have had MIPS machines with 1000 cores around 2010, and 100,000 to 1 million cores today, for under $1000.<p>Contrast that with GPU shaders where one C-style loop operates on buffers separate from system memory, and can&#x27;t access system services like network sockets or files. GPUs have around 32 or 64 physical cores, so theoretically that many shaders could run simultaneously, although we rarely see that in practice. And we&#x27;d need bare-metal drivers to access the GPU cores directly, does anyone know of any?<p>The closest thing now is Apple&#x27;s M1 line, but it has specialized NN and GPU cores, so missed out on the potential of true symmetric multiprocessing.<p>The reason I care about this so much is that with this amount of computing power, kids could run genetic algorithms and other &quot;embarrassingly parallel&quot; code that solves problems about as well as NNs in many cases. Instead we&#x27;re going to end up with yet another billion dollar bubble that locks us into whatever AI status quo that the tech industry manages to come up with. And everyone seems to love it. It reminds me of the scene in Star Wars III when Padme notes how liberty dies with thunderous applause.
评论 #37212708 未加载
评论 #37211575 未加载
评论 #37211700 未加载
评论 #37220273 未加载
评论 #37212591 未加载
评论 #37212427 未加载
评论 #37213076 未加载
评论 #37213018 未加载
ketanmaheshwari超过 1 年前
GNU Parallel has been one of my go to tool to accomplish more on the terminal. Generate test data, transferring data from one node to another using rsync, run many-task, embarrassingly parallel jobs on HPC, pipelines with simple data dependencies but run over hundreds or files are some of the places where I use GNU Parallel.<p>Many thanks to Ole Tange for developing the wonderful tool and helping the users on Stack Overflow sites to this day.<p>Shameless plug, I am developing a tutorial on GNU Parallel to be presented at eScience conference in Cyprus this year: <a href="https:&#x2F;&#x2F;www.escience-conference.org&#x2F;2023&#x2F;tutorials&#x2F;gnu_parallel" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.escience-conference.org&#x2F;2023&#x2F;tutorials&#x2F;gnu_paral...</a>
评论 #37208395 未加载
Aissen超过 1 年前
GNU parallel is great for the kind of tasks highlighted in the post. Note that being written in Perl, it&#x27;s slower than its simpler C counterpart moreutils parallel. And that in many uses cases xargs --max-procs=$(nproc) can replace it.
评论 #37209861 未加载
评论 #37211821 未加载
green-orca超过 1 年前
I&#x27;m using task spooler a lot for parallel background processing. What I like the most it the ability to add further tasks to the queue after processing has already started.<p><a href="https:&#x2F;&#x2F;manpages.ubuntu.com&#x2F;manpages&#x2F;xenial&#x2F;man1&#x2F;tsp.1.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;manpages.ubuntu.com&#x2F;manpages&#x2F;xenial&#x2F;man1&#x2F;tsp.1.html</a>
评论 #37208885 未加载
评论 #37213865 未加载
评论 #37208945 未加载
throwaway277432超过 1 年前
Is the author still adding the &quot;cite me or pay 10000€&quot; notice to the output? And calling that GPL?<p>And still answering every xargs Stackoverflow question with &quot;you should use GNU Parallel&quot; instead of answering the question? That really gets old quickly when googling for xarg answers.<p>These are just some of the reasons I&#x27;ll never use parallel. xargs is perfectly fine for most usecases, and it can do everything I need it to.
评论 #37209366 未加载
评论 #37208773 未加载
评论 #37208998 未加载
评论 #37209205 未加载
评论 #37209032 未加载
评论 #37209504 未加载
评论 #37213812 未加载
评论 #37210240 未加载
评论 #37209627 未加载
评论 #37208778 未加载
ssddanbrown超过 1 年前
Love finding a good use-case of parallel as an easy way to gain massive time savings, especially on the modern high-threaded CPUs of today. Most recently found it useful when batch-compressing large jpeg images to smaller webp files, via use with find and ImageMagick:<p><pre><code> find .&#x2F; -type f -iname &#x27;*.jpg&#x27; -size +1M -print0 | parallel -0 mogrify -format webp -quality 80 {}</code></pre>
评论 #37208734 未加载
评论 #37209602 未加载
评论 #37208570 未加载
titzer超过 1 年前
I didn&#x27;t know about this, and reading through the comments, I found out that xargs can also do batching and parallelism (nice!). However, it appears that if you pipe the output of an xargs-parallel command into another utility, it jumbles the output of the multiple subprocesses, whereas GNU parallel does not.<p>I was a little put off by the annoying&#x2F;scary citation issue mentioned by another commenter, so I am not sure I will use parallel.<p>I want to pipe the output of parallel processes into a utility that I wrote for progress printing (<a href="https:&#x2F;&#x2F;github.com&#x2F;titzer&#x2F;progress">https:&#x2F;&#x2F;github.com&#x2F;titzer&#x2F;progress</a>), but I think that neither of these solutions work; my progress utility will have to do this on its own.
评论 #37211446 未加载
评论 #37210614 未加载
bloopernova超过 1 年前
There&#x27;s a shell script version of GNU parallel that&#x27;s great for CI&#x2F;CD pipeline tasks. You just keep it in your repo and source it as needed. It&#x27;s incredibly useful, we use it in one build to batch process a few thousand things in groups of 25.<p>Edited to add: finally got signed in to work, you create the script via:<p><pre><code> parallel --embed &gt; scriptname.sh </code></pre> It&#x27;s about 14,000 lines of awesome and works on &quot;ash, bash, dash, ksh, sh, and zsh&quot;
评论 #37208713 未加载
评论 #37208641 未加载
rhysrhaven超过 1 年前
I much prefer rush over parallel. Namely that everything is executed as a bash shell.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;shenwei356&#x2F;rush">https:&#x2F;&#x2F;github.com&#x2F;shenwei356&#x2F;rush</a>
Decabytes超过 1 年前
I’ve been writing a lot of PowerShell recently and discovered the ForEach-Object cmdlets with the -parallel parameter and it has been addicting to parallelize my scripts, so I totally understand why parallelizing using a command line tool is attractive
asicsp超过 1 年前
Didn&#x27;t know about the book: <a href="https:&#x2F;&#x2F;zenodo.org&#x2F;record&#x2F;1146014" rel="nofollow noreferrer">https:&#x2F;&#x2F;zenodo.org&#x2F;record&#x2F;1146014</a> (discussed 4 years back: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20726631">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=20726631</a>)<p>See also <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?q=gnu+parallel" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?q=gnu+parallel</a> for other related discussions.
SPBS超过 1 年前
xargs is more useful because it&#x27;s posix so you can always guarantee it to be there (whereas with GNU Parallel you probably have to reach for a package manager to install it first). The ergonomics are worse though, as usual.
评论 #37208297 未加载
评论 #37208448 未加载
评论 #37208520 未加载
评论 #37208339 未加载
评论 #37208580 未加载
TZubiri超过 1 年前
First paragraph: I want to test my tests.<p>Second paragraph: I want to test my test-tester.<p>OP 100% fell down a rabbit-hole.
评论 #37210284 未加载
AvImd超过 1 年前
If none of the examples from the article work, make sure you are running GNU Parallel and not an identically named utility from moreutils.
ranting-moth超过 1 年前
Learning Parallel pays high dividends for the rest of your life.
评论 #37208833 未加载
pimpl超过 1 年前
Having a layer of parallelisation on top of good old sequential code seems like a very neat idea. It resolves headaches of learning how to run code in parallel in languages that aren’t necessarily my primary language (e.g. short, one-off scripts). Thanks for sharing!!
ogou超过 1 年前
Someone gifted an old blade server to me a few years ago. Very slow, but 16 cores and 24 gig of RAM. At the time I was making a lot of video art with ffmpeg, without a GPU. That version of ffmpeg wasn&#x27;t optimized for multiple cores so rendering was really slow and sequential. I discovered Parallel and set the server to process large videos with most of the cores in parallel. Voila, it chewed through a massive amount of media fairly quickly. Faster than the hard drives actually.
bcjordan超过 1 年前
Folks who are here and interested in parallelization for CI&#x2F;CD may also be interested in Dagger.io — I had heard about it on HN over the years but not played w it. It&#x27;s basically a more fine-grained Docker-like executor with better caching and utilities for spinning up services and running tests.<p>Curious if anyone else has experiences with it, honestly been surprised at how little I&#x27;ve heard about it
jamietanna超过 1 年前
One thing I&#x27;ve used parallel before is to add the ability to add straightforward retry mechanisms, and it was great! <a href="https:&#x2F;&#x2F;www.jvt.me&#x2F;posts&#x2F;2022&#x2F;04&#x2F;28&#x2F;shell-queue&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.jvt.me&#x2F;posts&#x2F;2022&#x2F;04&#x2F;28&#x2F;shell-queue&#x2F;</a>
figomore超过 1 年前
I use GNU Parallel to render Blender videos distributed by a bunch of nodes <a href="https:&#x2F;&#x2F;github.com&#x2F;tfmoraes&#x2F;blender_gnu_parallel_render">https:&#x2F;&#x2F;github.com&#x2F;tfmoraes&#x2F;blender_gnu_parallel_render</a>
rubicks超过 1 年前
I can appreciate that GNU parallel exists. I always use `xargs -P0` in my own work, though.
sneak超过 1 年前
See also: ppss (parallel processing shell script) <a href="https:&#x2F;&#x2F;github.com&#x2F;louwrentius&#x2F;PPSS#">https:&#x2F;&#x2F;github.com&#x2F;louwrentius&#x2F;PPSS#</a>
nateb2022超过 1 年前
There&#x27;s also PaSh: <a href="https:&#x2F;&#x2F;github.com&#x2F;binpash&#x2F;pash">https:&#x2F;&#x2F;github.com&#x2F;binpash&#x2F;pash</a>
jooz超过 1 年前
I try to use it last week to run 10 instances of curl against a webserver.<p>I was expecting something simple as &#x27;parallel -j10 curl <a href="https:&#x2F;&#x2F;whatever" rel="nofollow noreferrer">https:&#x2F;&#x2F;whatever</a>&#x27; but couldnt find the right syntax in less time that took me to prepare a dirty shell script that did the same.
评论 #37211320 未加载
评论 #37214639 未加载
grepfru_it超过 1 年前
The same can be implemented with just bash using jobs and wait. Useful if parallel is not available in your pipeline
heinrichhartman超过 1 年前
As the answer to the question was not actually given in the post:<p><pre><code> &#x2F;usr&#x2F;bin&#x2F;parallel</code></pre>
aquir超过 1 年前
&quot;Do one thing and do it well&quot;
nullc超过 1 年前
parallel is great but its default behaviors never quite seem to match my needs, so every time I use it I have to spend some time consulting the man page. Fortunately, the man page is more than up to the task.<p>But because of the mini learning curve on each use and because I find I need a little more boiler plate to use parallel, I use xargs -P more often, only using parallel when I need its special features (e.g. multiple hosts or collating the output streams).<p>Oh also, parallel itself can be a bit of a resource hog. (Obviously that depends a lot on how you&#x27;re using it-- but I mean in cases where xargs&#x27; usage is unnoticeable I sometimes have to change the size of my jobs to get parallel out of the way).
herrkanin超过 1 年前
I have wanted to parallelize my .zshrc file for a while – all those environment setup scripts for nvm, pyenv, starship, etc really makes the startup time noticably slow. Does anyone know how to do this?
评论 #37210092 未加载
jp57超过 1 年前
Seems like you could accomplish the same thing more cleanly (IMO) with make. You can create a target for each test, which can be done with patterns, and then use `make -j` to run them in parallel.
morbidious超过 1 年前
Looks like a great tool!<p>Thanks for the link to the book: <a href="https:&#x2F;&#x2F;zenodo.org&#x2F;record&#x2F;1146014" rel="nofollow noreferrer">https:&#x2F;&#x2F;zenodo.org&#x2F;record&#x2F;1146014</a>
michaelcampbell超过 1 年前
parallel is one of those tools like jq, to me. It&#x27;s great, but by the time I&#x27;ve grokked the syntax, AGAIN, I&#x27;d&#x27;ve been quicker to write a quick shell&#x2F;ruby&#x2F;python script to do it that&#x27;s almost readable.
b0afc375b5超过 1 年前
What about &amp; and wait? Could it have been an adequate alternative?
评论 #37208273 未加载
评论 #37215206 未加载
评论 #37208271 未加载
评论 #37208291 未加载
toastal超过 1 年前
I use this with Nix all the time. Great utility.
评论 #37211249 未加载
timtom39超过 1 年前
Love the tool. One of my favorite snippets adds parallel processing to jq<p>#!&#x2F;bin&#x2F;bash<p>cat - | parallel --line-buffer --pipe --roundrobin jq &quot;$@&quot;
pmarreck超过 1 年前
HIPS (Hiding In Plain Sight)!
lfconsult超过 1 年前
Wonderful... Thanks for sharing.
amelius超过 1 年前
Another reminder that you shouldn&#x27;t use Bash to write scripts.<p>E.g. in Python this would all be very easy to do. Just start a bunch of threads and e.g. invoke subprocess.run() from them.
评论 #37208784 未加载
评论 #37210743 未加载
评论 #37209549 未加载
评论 #37208801 未加载
cusspvz超过 1 年前
You guys know that in bash you can use `&amp;` to pass a foreground terminal process to the background and then use `wait` to wait for all the session&#x27;s background process to end, right?
评论 #37209313 未加载
评论 #37208962 未加载
评论 #37209976 未加载
评论 #37218905 未加载
评论 #37212263 未加载
评论 #37209874 未加载
quickthrower2超过 1 年前
It is sort if a shame that tools can’t figure out how to parallelize things without being herded like cattle to do so.<p>It might be a culture thing. In .NET code I see people running things in parallel a lot within code but maybe this is less so for linux tools.<p>Maybe functional programming style could lend to a parallel-first programming style, with heuristics to decide when it isn’t worth it.
评论 #37210234 未加载