The most surprising Unix programs

539 pointsby vitplisterabout 5 years ago

32 comments

abetuskabout 5 years ago

For me, the most surprising one was paste.paste allowed me to interleave to streams or to split out a single stream into two columns. I'd been writing custom scripting monstrosities before I discovered paste:<pre><code> $ paste <( echo -e 'foo\nbar' ) <( echo -e 'baz\nqux' ) foo baz bar qux $ echo -e 'foo\nbar\nbaz\nqux' | paste - - foo bar baz qux </code></pre> I wonder what other unix gems I've been missing...

评论 #22585318 未加载

评论 #22585103 未加载

评论 #22584764 未加载

评论 #22584696 未加载

评论 #22586103 未加载

mciabout 5 years ago

> Hidden inside WWB (writer's workbench), Lorinda Cherry's Parts annotated English text with parts of speech, based on only a smidgen of English vocabulary, orthography, and grammar.Writer's Workbench was indeed a marvel of 1970's limited-space engineering. You can see it for yourself [1]: the generic part-of-speech rules are in end.l, the exceptions in edict.c and ydict.c, and the part-of-speech disambiguator in pscan.c. Such compact, rule-based NLP has fallen out of favor these days but (shameless plug alert!) Writer's Workbench inspired my 2018 IOCCC entry that highlights passive constructions in English texts [2].[1] <a href="https://github.com/dspinellis/unix-history-repo/tree/BSD-4_1_snap-Snapshot-Development/.ref-BSD-4/usr/src/cmd/diction" rel="nofollow">https://github.com/dspinellis/unix-history-repo/tree/BSD-4_1...</a>[2] <a href="https://ioccc.org/2018/ciura/hint.html" rel="nofollow">https://ioccc.org/2018/ciura/hint.html</a>

评论 #22583553 未加载

评论 #22583576 未加载

sn41about 5 years ago

One of the useful applications of trigram-based analysis I have done is the following: for a large web-based application form where about 200000 online applications were made, we had to filter out the dummy applications - often, people would try out the interface using "aaa" as a name, for example.Since the names were mostly Indian, we did not even have a standard database of names to test against.What we did was the following: go through the entire database of all applications, and build a trigram frequency table. Then, using that trigram table, do a second pass over the database of names to find names with anomalous trigrams - if the percentage of trigram frequency anomaly in a name was too high (if the name was long enough), or the absolute number of trigrams in name was too high (if the name was short), we flagged the application and examined it manually. Using this alone, we were able to filter out a large number of dummy application forms.Of course, it is not a comprehensive tool since what forms a valid name is very vague, but I think this kind of a tool is useful and culture-neutral.

评论 #22583077 未加载

znpyabout 5 years ago

It's surprising that Doug McIlroy still reads and writes about UNIX.For those who don't know, Dough is the guy that invented pipes.

评论 #22582711 未加载

评论 #22593124 未加载

评论 #22582690 未加载

tangueabout 5 years ago

I didn't knew about typo. One surprising unix program I discovered this year is cal (or ncal). Having a calendar in your terminal is sometimes useful and I wish I knew earlier I could type things like ncal -w 2020

评论 #22584246 未加载

评论 #22583169 未加载

评论 #22589094 未加载

评论 #22586105 未加载

saagarjhaabout 5 years ago

And people say theoretical computer science isn’t useful in “the real world”…I am curious about this one, though, has anyone used it?> The syntax diagnostics from the compiler made by Sue Graham's group at Berkeley were the mmost helpful I have ever seen--and they were generated automatically. At a syntax error the compiler would suggest a token that could be inserted that would allow parsing to proceed further. No attempt was made to explain what was wrong.On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate: was it any better than this, or was that the extent of its intelligence and everything else at the time was even worse?

评论 #22582675 未加载

评论 #22583248 未加载

评论 #22583698 未加载

评论 #22583103 未加载

chmaynardabout 5 years ago

The author is THE Doug McIlroy. It's wonderful to learn that he's still around and spreading the good word.<a href="https://en.wikipedia.org/wiki/Douglas_McIlroy" rel="nofollow">https://en.wikipedia.org/wiki/Douglas_McIlroy</a>

mjw1007about 5 years ago

« Typo was as surprising inside as it was outside. Its similarity measure was based on trigram frequencies, which it counted in a 26x26x26 array. The small memory, which had barely room enough for 1-byte counters, spurred a scheme for squeezing large numbers into small counters. To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count. »This sounds like something from the same family as hyperloglogWikipedia traces that back to the Flajolet–Martin algorithm in 1984. When would typo have been written?

评论 #22583284 未加载

评论 #22582872 未加载

评论 #22583363 未加载

评论 #22584068 未加载

评论 #22592377 未加载

评论 #22583133 未加载

adbenabout 5 years ago

How about GNU parallel? <a href="https://www.gnu.org/software/parallel/" rel="nofollow">https://www.gnu.org/software/parallel/</a>

评论 #22583071 未加载

评论 #22583122 未加载

nunoferreiraabout 5 years ago

What about "comm" - compare two sorted files line by line. You can easily get occurrences only in file 1, in both files, only in file 2.Super powerful and saved me hours of work.

评论 #22583101 未加载

评论 #22583714 未加载

beefbroccoliabout 5 years ago

There's a very simple system tool that clicked on about 50 simultaneous lightbulbs in my brain after only 10 minutes of playing with it: mkfifo

评论 #22655114 未加载

评论 #22586663 未加载

ur-whaleabout 5 years ago

The fact that dc does (or at least tries to) guarantee error bounds on the result is news to me.And if that does indeed work, that's pretty cool.

评论 #22584386 未加载

评论 #22583485 未加载

kmstoutabout 5 years ago

sl```<pre><code> ( ) (@@) ( ) (@) () @@ O @ O @ O (@@@) ( ) (@@@@) ( ) ==== ________ ___________ _D _| |_______/ \__I_I_____===__|_________| |(_)--- | H\________/ | | =|___ ___| _________________ / | | H | | | | ||_| |_|| _| \_____A | | | H |__--------------------| [___] | =| | | ________|___H__/__|_____/[][]~\_______| | -| | |/ | |-----------I_____I [][] [] D |=======|____|________________________|_ __/ =| o |=-O=====O=====O=====O \ ____Y___________|__|__________________________|_ |/-=|___|= || || || |_____/~\___/ |_D__D__D_| |_D__D__D_| \_/ \__/ \__/ \__/ \__/ \_/ \_/ \_/ \_/ \_/</code></pre> ```

评论 #22584375 未加载

评论 #22589966 未加载

morelispabout 5 years ago

> struct - Brenda Baker undertook her Fortan-to-Ratfor converter against the advice of her department head--me. I thought it would likely produce an ad hoc reordering of the orginal, freed of statement numbers, but otherwise no more readable than a properly indented Fortran program. Brenda proved me wrong. She discovered that every Fortran program has a canonically structured form. Programmers preferred the canonicalized form to what they had originally written.We could've had prettier et al instead of style linters 40(+?) years ago. :(

评论 #22582531 未加载

评论 #22583135 未加载

jawilsonabout 5 years ago

I've written a few useful scripts that everyone should have.histogram - simply counts each occurrence of a line and then outputs from highest to lowest. I've implemented this program in several different languages for learning purposes. There are practical tricks that one can apply, such as hashing any line longer than the hash itself.unique - like uniq but doesn't need to have sorted input! again, one can simply hash very long lines to save memory.datetimes - looks for numbers that might be dates (seconds or milliseconds in certain reasonable ranges) and adds the human readable version of the date as comments to the end of the line they appear in. This is probably my most used script (I work with protocol buffers that often store dates as int64s).human - reformats numbers into either powers of 2 or powers of 10. inspired obviously by the -h and -H flags from df.I'm sure I have a few more but if I can't remember them from the top of my head, then they clearly aren't quite as generally useful.Anyone else have some useful scripts like these?

评论 #22584724 未加载

评论 #22584221 未加载

评论 #22589214 未加载

mkchoi212about 5 years ago

“To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count.”Stuff like this really makes me love what the pioneers of CS did in the past. In the past, they were counting every byte and every register while nowadays, programmers make things without considering the impact it will have on the HW.

londons_exploreabout 5 years ago

> The math library for Bob Morris's variable-precision desk calculator used backward error analysis to determine the precision necessary at each step to attain the user-specified precision of the result.I wonder if compilers could do this today? If you can bound values for floating point operations, you might be able to replace them with fixed point equivalents and get a big speedup. You might also be able to replace them with ints or smaller floats if you can detect the result is rounded to an int.CPU's also have the possibility to do this since they know (some of) the actual values at runtime, and could take shortcuts with floating point calculation in places where not needed for the result.

评论 #22583391 未加载

评论 #22584314 未加载

tannhaeuserabout 5 years ago

What's surprising about eqn, dc, and egrep? I'm using the latter two all the time, and have used eqn (+troff/groff and even tbl and pic) in the 1990's for manuals and as late as (early) 2000's to typeset math-heavy course material. Not nearly as feature-rich as TeX/LaTeX, but much more approachable for casual math, with DSLs for typesetting equations, tables, and diagrams/graphs. I was delighted to see that GNU had a full suite of roff/troff drop-in replacements (which I later learned was implemented by James Clark, of SGML and, recently, Ballerina fame).

评论 #22582668 未加载

评论 #22583105 未加载

评论 #22590279 未加载

评论 #22582563 未加载

ur-whaleabout 5 years ago

First time I hear of typo ... it's not on my standard Linux install ... where can I find the source code?

评论 #22582663 未加载

评论 #22583704 未加载

ruslanabout 5 years ago

I would add bc to the list, very useful to make occasional calculations from command line using "human readable" syntax.

评论 #22582641 未加载

lcallabout 5 years ago

I have found it useful to survey the existing unix utilities (maybe every several years). I'm no genius but I find things I will use. One way of course is simply to review the names in wherever your system stores manual pages, and read (or skim) those where you don't know what they do, trying out some things, or trying to remember at least where to look it up later when ready to use it. Another is by browsing to <a href="https://man.openbsd.org/" rel="nofollow">https://man.openbsd.org/</a> , then put a single period (".") in the search field, optionally choose a section (and/or other system, not sure how far the coverage goes), and click the apropos button.

jhoechtlabout 5 years ago

Doug McIlroy is regularly active in the groff mailing list <a href="https://lists.gnu.org/archive/html/groff/" rel="nofollow">https://lists.gnu.org/archive/html/groff/</a>

Torwaldabout 5 years ago

What does he man by "record structure in the file system" in re to Multics?

评论 #22587249 未加载

评论 #22587415 未加载

vladdosterabout 5 years ago

Crabs seems likes a really cool program.Here is a paper from Bell Labs<a href="http://lucacardelli.name/Papers/Crabs.pdf" rel="nofollow">http://lucacardelli.name/Papers/Crabs.pdf</a>

noisy_boyabout 5 years ago

I didn't find egrep surprising - I use it quite often. The thing I didn't know about it was that it was Al Aho's creation. I only knew about him from awk.

yegleabout 5 years ago

killall5 is the most bizarre command that I learned recently.Read manpage before trying it.

评论 #22585724 未加载

评论 #22587192 未加载

评论 #22588599 未加载

评论 #22585606 未加载

smitty1eabout 5 years ago

Hadn't heard of most of these.The peoples' names were more recognizable.

winridabout 5 years ago

I found GNU parallel to be very useful/cool.

katharine7about 5 years ago

sed awk tr egrep for processing making special greeting lol converting imagesall are so exciting!!

TheDesolate0about 5 years ago

sed & awk for life

pvaldesabout 5 years ago

both rename and mmv are pretty handy

hyperpalliumabout 5 years ago

xargs parallelizes with -Pn

32 comments

abetuskabout 5 years ago

评论 #22585318 未加载

评论 #22585103 未加载

评论 #22584764 未加载

评论 #22584696 未加载

评论 #22586103 未加载

mciabout 5 years ago

评论 #22583553 未加载

评论 #22583576 未加载

sn41about 5 years ago

评论 #22583077 未加载

znpyabout 5 years ago

It's surprising that Doug McIlroy still reads and writes about UNIX.For those who don't know, Dough is the guy that invented pipes.

评论 #22582711 未加载

评论 #22593124 未加载

评论 #22582690 未加载

tangueabout 5 years ago

评论 #22584246 未加载

评论 #22583169 未加载

评论 #22589094 未加载

评论 #22586105 未加载

saagarjhaabout 5 years ago

评论 #22582675 未加载

评论 #22583248 未加载

评论 #22583698 未加载

评论 #22583103 未加载

chmaynardabout 5 years ago

mjw1007about 5 years ago

评论 #22583284 未加载

评论 #22582872 未加载

评论 #22583363 未加载

评论 #22584068 未加载

评论 #22592377 未加载

评论 #22583133 未加载

adbenabout 5 years ago

How about GNU parallel? <a href="https://www.gnu.org/software/parallel/" rel="nofollow">https://www.gnu.org/software/parallel/</a>

评论 #22583071 未加载

评论 #22583122 未加载

nunoferreiraabout 5 years ago

What about "comm" - compare two sorted files line by line. You can easily get occurrences only in file 1, in both files, only in file 2.Super powerful and saved me hours of work.

评论 #22583101 未加载

评论 #22583714 未加载

beefbroccoliabout 5 years ago

There's a very simple system tool that clicked on about 50 simultaneous lightbulbs in my brain after only 10 minutes of playing with it: mkfifo

评论 #22655114 未加载

评论 #22586663 未加载

ur-whaleabout 5 years ago

The fact that dc does (or at least tries to) guarantee error bounds on the result is news to me.And if that does indeed work, that's pretty cool.

评论 #22584386 未加载

评论 #22583485 未加载

kmstoutabout 5 years ago

评论 #22584375 未加载

评论 #22589966 未加载

morelispabout 5 years ago

评论 #22582531 未加载

评论 #22583135 未加载

jawilsonabout 5 years ago

评论 #22584724 未加载

评论 #22584221 未加载

评论 #22589214 未加载

mkchoi212about 5 years ago

londons_exploreabout 5 years ago

评论 #22583391 未加载

评论 #22584314 未加载

tannhaeuserabout 5 years ago

评论 #22582668 未加载

评论 #22583105 未加载

评论 #22590279 未加载

评论 #22582563 未加载

ur-whaleabout 5 years ago

First time I hear of typo ... it's not on my standard Linux install ... where can I find the source code?

评论 #22582663 未加载

评论 #22583704 未加载

ruslanabout 5 years ago

I would add bc to the list, very useful to make occasional calculations from command line using "human readable" syntax.

评论 #22582641 未加载

lcallabout 5 years ago

jhoechtlabout 5 years ago

Doug McIlroy is regularly active in the groff mailing list <a href="https://lists.gnu.org/archive/html/groff/" rel="nofollow">https://lists.gnu.org/archive/html/groff/</a>

Torwaldabout 5 years ago

What does he man by "record structure in the file system" in re to Multics?

评论 #22587249 未加载

评论 #22587415 未加载

vladdosterabout 5 years ago

Crabs seems likes a really cool program.Here is a paper from Bell Labs<a href="http://lucacardelli.name/Papers/Crabs.pdf" rel="nofollow">http://lucacardelli.name/Papers/Crabs.pdf</a>

noisy_boyabout 5 years ago

I didn't find egrep surprising - I use it quite often. The thing I didn't know about it was that it was Al Aho's creation. I only knew about him from awk.

yegleabout 5 years ago

killall5 is the most bizarre command that I learned recently.Read manpage before trying it.

评论 #22585724 未加载

评论 #22587192 未加载

评论 #22588599 未加载

评论 #22585606 未加载

smitty1eabout 5 years ago

Hadn't heard of most of these.The peoples' names were more recognizable.

winridabout 5 years ago

I found GNU parallel to be very useful/cool.

katharine7about 5 years ago

sed awk tr egrep for processing making special greeting lol converting imagesall are so exciting!!

TheDesolate0about 5 years ago

sed & awk for life

pvaldesabout 5 years ago

both rename and mmv are pretty handy

hyperpalliumabout 5 years ago

xargs parallelizes with -Pn