科技回声

5 条评论

yason大约 16 年前

I'm afraid I'm going against the Unix idiom of combining simple tools to do more advanced stuff, I can't resists here ;_)While it is idiomatic in Unix to use xargs for parallelising batch runs I found it pretty cumbersome because you have to be really careful with quotes, file names and command lines with spaces to make sure the command line will be "nice" in order to not fuck up something serious.Moreover, xargs does have its uses but I mostly find I use it for trivial things where I can be sure it works. The xargs idiom seems to be fed a list of files, even more typically from the find command, as in "find . -name _out | xargs rm -r". That's the reason there's -0 in xargs while there's the matching -print0 in find.I wrote a small utility myself (<a href="http://code.google.com/p/spawntool/" rel="nofollow">http://code.google.com/p/spawntool/</a>) that reads from stdin and treats each line as a complete command line that is directly passed to system(), and then manages the parallelisation up to N processes.This is pretty useful for feeding in _any_ batch of commands, even unrelated (not derived from a list of files). You could also feed the same input stream or file straight to 'sh' (for compatibility cross-checking) or you could verify the input command lines in plaintext before daring with either sh or spawntool. This would be like ... | xargs sh without the white-space and expansion head-aches.It's pretty easy to generate complete command lines yourself and much safer than letting xargs join stuff together.

mattj大约 16 年前

Running these two commands in series is likely vastly overstating the performance gains - almost all your time is going to be spent in io, and the second time around you'll have a good chunk (if not all) of it in disk cache. Try running both a few repeated times and see if you enjoy the same gains (on my iphone right now, so I can't do this myself)

评论 #514643 未加载

IsaacSchlueter大约 16 年前

dtach is great for long-running jobs, too. If you pipe the output to a file, you can even log out and check back later.I use this function to pass stuff off to a detached process:<pre><code> # usage: # headless "some_long_job" "long_job" # go get some tea, and come back # headless "" "long_job" (to join that session) # still not done, so ^\ to detach from it again # Usually, I pipe the output of some_long_job to # a file, so I can peek in on it easily headless () { if [ "$2" == "" ]; then hash=`md5 -qs "$1"` else hash="$2" fi if [ "$1" != "" ]; then dtach -n /tmp/headless-$hash bash -l -c "$1" else dtach -A /tmp/headless-$hash bash -l fi }</code></pre>

mblakele大约 16 年前

When using this sort of trick, I also find it useful to throw in GNU screen, nohup, or the bash 'disown' command.

aolnerd大约 16 年前

I find that xargs is the most convenient way to achieve parallelism for quick and easy batch work. Just write your script to receive its unit of work as a command argument (or as multiple args if starting a process is a heavy operation). Use any language. Utilize all your cores.

5 条评论

yason大约 16 年前

mattj大约 16 年前

评论 #514643 未加载

IsaacSchlueter大约 16 年前

mblakele大约 16 年前

When using this sort of trick, I also find it useful to throw in GNU screen, nohup, or the bash 'disown' command.

aolnerd大约 16 年前

Parallelizing Jobs with xargs

5 条评论

Parallelizing Jobs with xargs

5 条评论