Related to this, pyp is worth taking a look at if you're interested in doing manipulation using python's libraries, but on the command line:<p><a href="http://code.google.com/p/pyp/" rel="nofollow">http://code.google.com/p/pyp/</a>
The Unix Programming Environment by Kernighan and Pike and The AWK Programming Language are still the best books one can read about Unix text manipulation, and about Unix, period. (Part of the point is that in Unix text is supposed to be the universal language).
I like how it's laid out from the most specific tools that are easy to understand and eventually leads to the pocketknives of sed and awk that beginners might not need until they've exhausted the potential of the previous commands.
Unix for Poets is a great set of exercises for someone wanting to learn more about text manipulation with Unix tools.<p><a href="http://www.iro.umontreal.ca/~felipe/IFT6010-Automne2011/resources/Articles/UnixforPoets" rel="nofollow">http://www.iro.umontreal.ca/~felipe/IFT6010-Automne2011/reso...</a>
Thanks for this! I really like these kinds of summaries, because while I love grep and cut and wc and perl, there are commands in here I really haven't heard of.<p>Plus I enjoy stringing together one-off filters longer than my arm.
If you like this, then check out Unix Power Tools. It's full of exactly this kind of stuff, with broader and deeper coverage. I highly recommend it -- I consider it one of the top ten or so books for a new programmer to spend some time with.
One useful addition to the section on streams would have been that of process substitution:<p><a href="http://tldp.org/LDP/abs/html/process-sub.html" rel="nofollow">http://tldp.org/LDP/abs/html/process-sub.html</a><p>This allows you to have more than just the standard streams.
I once wrote this introduction to UNIX (which is unfortunately not complete, I lost the DocBook sources), that also provides an introduction to text manipulation.<p><a href="http://danieldk.eu/Writings/unixsystems.pdf" rel="nofollow">http://danieldk.eu/Writings/unixsystems.pdf</a>
This used to be a great site (ignore its very un-PC site name):<p><a href="http://bashcurescancer.com/" rel="nofollow">http://bashcurescancer.com/</a><p>It seems the site is down.
Sort of related: rpl[1] is an often overlooked tool for replacing text across multiple files. Terser than "perl pie" and a few nice features like simulation mode.<p>[1] <a href="http://www.laffeycomputer.com/rpl.html" rel="nofollow">http://www.laffeycomputer.com/rpl.html</a>