TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Parallelizing Jobs with xargs

41 点作者 r11t大约 16 年前

5 条评论

yason大约 16 年前
I'm afraid I'm going against the Unix idiom of combining simple tools to do more advanced stuff, I can't resists here ;_)<p>While it is idiomatic in Unix to use xargs for parallelising batch runs I found it pretty cumbersome because you have to be really careful with quotes, file names and command lines with spaces to make sure the command line will be "nice" in order to not fuck up something serious.<p>Moreover, xargs does have its uses but I mostly find I use it for trivial things where I can be sure it works. The xargs idiom seems to be fed a list of files, even more typically from the find command, as in "find . -name _out | xargs rm -r". That's the reason there's -0 in xargs while there's the matching -print0 in find.<p>I wrote a small utility myself (<a href="http://code.google.com/p/spawntool/" rel="nofollow">http://code.google.com/p/spawntool/</a>) that reads from stdin and treats each line as a complete command line that is directly passed to system(), and then manages the parallelisation up to N processes.<p>This is pretty useful for feeding in _any_ batch of commands, even unrelated (not derived from a list of files). You could also feed the same input stream or file straight to 'sh' (for compatibility cross-checking) or you could verify the input command lines in plaintext before daring with either sh or spawntool. This would be like ... | xargs sh without the white-space and expansion head-aches.<p>It's pretty easy to generate complete command lines yourself and much safer than letting xargs join stuff together.
mattj大约 16 年前
Running these two commands in series is likely vastly overstating the performance gains - almost all your time is going to be spent in io, and the second time around you'll have a good chunk (if not all) of it in disk cache. Try running both a few repeated times and see if you enjoy the same gains (on my iphone right now, so I can't do this myself)
评论 #514643 未加载
IsaacSchlueter大约 16 年前
dtach is great for long-running jobs, too. If you pipe the output to a file, you can even log out and check back later.<p>I use this function to pass stuff off to a detached process:<p><pre><code> # usage: # headless "some_long_job" "long_job" # go get some tea, and come back # headless "" "long_job" (to join that session) # still not done, so ^\ to detach from it again # Usually, I pipe the output of some_long_job to # a file, so I can peek in on it easily headless () { if [ "$2" == "" ]; then hash=`md5 -qs "$1"` else hash="$2" fi if [ "$1" != "" ]; then dtach -n /tmp/headless-$hash bash -l -c "$1" else dtach -A /tmp/headless-$hash bash -l fi }</code></pre>
mblakele大约 16 年前
When using this sort of trick, I also find it useful to throw in GNU screen, nohup, or the bash 'disown' command.
aolnerd大约 16 年前
I find that xargs is the most convenient way to achieve parallelism for quick and easy batch work. Just write your script to receive its unit of work as a command argument (or as multiple args if starting a process is a heavy operation). Use any language. Utilize all your cores.