Nice article. Really easy to follow introduction.<p>I only discovered process substitution a few months ago but it's already become a frequently used tool in my kit.<p>One thing that I find a little annoying about unix commands sometimes is how hard it can be to google for them. '<()', nope, "command as file argument to other command unix," nope. The first couple of times I tried to use it, I knew it existed but struggled to find any documentation. "Damnit, I know it's something like that, how does it work again?..."<p>Unless you know to look for "Process Substitution" it can be hard to find information on these things. And that's once you even know these things exist....<p>Anyone know a good resource I should be using when I find myself in a situation like that?
Once you discover <() it's hard not to (ab)use it everywhere :-)<p><pre><code> # avoid temporary files when some program needs two inputs:
join -e0 -o0,1.1,2.1 -a1 -a2 -j2 -t$'\t' \
<(sort -k2,2 -t$'\t' freq/forms.${lang}) \
<(sort -k2,2 -t$'\t' freq/lms.${lang})
# gawk doesn't care if it's given a regular file or the output fd of some process:
gawk -v dict=<(munge_dict) -f compound_translate.awk <in.txt
# prepend a header:
cat <(echo -e "${word}\t% ${lang}\tsum" | tr [:lower:] [:upper:]) \
<(coverage ${lang})</code></pre>
Pipes are probably the original instantiation of dataflow processing (dating back to the 1960s). I gave a tech talk on some of the frameworks:
<a href="https://www.youtube.com/watch?v=3oaelUXh7sE" rel="nofollow">https://www.youtube.com/watch?v=3oaelUXh7sE</a><p>And my company creates a cool dataflow platform - <a href="https://composableanalytics.com" rel="nofollow">https://composableanalytics.com</a>
Vince Buffalo is author of the best book on bioinformatics: Bioinformatics Data Skills (O'Reilly). It's worth a read for learning unix/bash style data science of any flavour.<p>Or even if you think you know unix/bash and data there are new and unexpected snippets every few pages that surprise you.
In zsh, =(cmd) will create a temporary file, <(cmd) will create a named pipe, and $(cmd) creates a subshell. There are also fancy options that use MULTIOS. For example:<p><pre><code> paste <(cut -f1 file1) <(cut -f3 file2) | tee >(process1) >(process2) >/dev/null
</code></pre>
can be re-written as:<p><pre><code> paste <(cut -f1 file1) <(cut -f3 file2) > >(process1) > >(process2)
</code></pre>
<a href="http://zsh.sourceforge.net/Doc/Release/Expansion.html#Process-Substitution" rel="nofollow">http://zsh.sourceforge.net/Doc/Release/Expansion.html#Proces...</a><p><a href="http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redirection" rel="nofollow">http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redi...</a>
If you like pipes, then you will love lazy evaluation. It is unfortunate, though, that Unix doesn't support that (operations can block when "writing" only, not when "nobody is reading").
AFAIK process substitution is a bash-ism (not part of POSIX spec for /bin/sh). I recently had to go with the slightly less wieldy named pipes in a dash environment and put the pipe setup, command execution and teardown in a script.
In fish shell the canonical example is this:<p><pre><code> diff (sort a.txt|psub) (sort b.txt|psub)
</code></pre>
The psub command performs the process substitution.
How does the > process substitution differ from simply piping the output with | ?<p>For example (from Wikipedia)<p>tee >(wc -l >&2) < bigfile | gzip > bigfile.gz<p>vs<p>tee < bigfile | wc -l | gzip > bigfile.gz
Anybody know of a way to increase the buffer size of pipes? I've experienced cases where piping a really fast program to a slow one caused them both to go slower as the OS pauses first program writing when pipe buffer is full. This seemed to ruin the caching for the first program and caused them both to be slower even though normally pipes are faster as you're not touching disk.
Is this guy a bioinformatician? I think he's a bioinformatician.<p>Can't be sure if he is a bioinformatician because he never really mentions that he is a bioinformatician.
moreutils [1] has some really cool programs for pipe handling.<p>pee: tee standard input to pipes
sponge: soak up standard input and write to a file
ts: timestamp standard input
vipe: insert a text editor into a pipe<p>[1] <a href="https://joeyh.name/code/moreutils/" rel="nofollow">https://joeyh.name/code/moreutils/</a>
Pipes are very cool and useful, but it's hard for me to understand this common <i>worship</i> of something like that. Yes, it's useful and elegant, but is it really the best thing since Jesus Christ?