TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The most surprising Unix programs

539 pointsby vitplisterabout 5 years ago

32 comments

abetuskabout 5 years ago
For me, the most surprising one was paste.<p>paste allowed me to interleave to streams or to split out a single stream into two columns. I&#x27;d been writing custom scripting monstrosities before I discovered paste:<p><pre><code> $ paste &lt;( echo -e &#x27;foo\nbar&#x27; ) &lt;( echo -e &#x27;baz\nqux&#x27; ) foo baz bar qux $ echo -e &#x27;foo\nbar\nbaz\nqux&#x27; | paste - - foo bar baz qux </code></pre> I wonder what other unix gems I&#x27;ve been missing...
评论 #22585318 未加载
评论 #22585103 未加载
评论 #22584764 未加载
评论 #22584696 未加载
评论 #22586103 未加载
mciabout 5 years ago
&gt; Hidden inside WWB (writer&#x27;s workbench), Lorinda Cherry&#x27;s Parts annotated English text with parts of speech, based on only a smidgen of English vocabulary, orthography, and grammar.<p>Writer&#x27;s Workbench was indeed a marvel of 1970&#x27;s limited-space engineering. You can see it for yourself [1]: the generic part-of-speech rules are in end.l, the exceptions in edict.c and ydict.c, and the part-of-speech disambiguator in pscan.c. Such compact, rule-based NLP has fallen out of favor these days but (shameless plug alert!) Writer&#x27;s Workbench inspired my 2018 IOCCC entry that highlights passive constructions in English texts [2].<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;dspinellis&#x2F;unix-history-repo&#x2F;tree&#x2F;BSD-4_1_snap-Snapshot-Development&#x2F;.ref-BSD-4&#x2F;usr&#x2F;src&#x2F;cmd&#x2F;diction" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dspinellis&#x2F;unix-history-repo&#x2F;tree&#x2F;BSD-4_1...</a><p>[2] <a href="https:&#x2F;&#x2F;ioccc.org&#x2F;2018&#x2F;ciura&#x2F;hint.html" rel="nofollow">https:&#x2F;&#x2F;ioccc.org&#x2F;2018&#x2F;ciura&#x2F;hint.html</a>
评论 #22583553 未加载
评论 #22583576 未加载
sn41about 5 years ago
One of the useful applications of trigram-based analysis I have done is the following: for a large web-based application form where about 200000 online applications were made, we had to filter out the dummy applications - often, people would try out the interface using &quot;aaa&quot; as a name, for example.<p>Since the names were mostly Indian, we did not even have a standard database of names to test against.<p>What we did was the following: go through the entire database of all applications, and build a trigram frequency table. Then, using that trigram table, do a second pass over the database of names to find names with anomalous trigrams - if the percentage of trigram frequency anomaly in a name was too high (if the name was long enough), or the absolute number of trigrams in name was too high (if the name was short), we flagged the application and examined it manually. Using this alone, we were able to filter out a large number of dummy application forms.<p>Of course, it is not a comprehensive tool since what forms a valid name is very vague, but I think this kind of a tool is useful and culture-neutral.
评论 #22583077 未加载
znpyabout 5 years ago
It&#x27;s surprising that Doug McIlroy still reads and writes about UNIX.<p>For those who don&#x27;t know, Dough is the guy that invented pipes.
评论 #22582711 未加载
评论 #22593124 未加载
评论 #22582690 未加载
tangueabout 5 years ago
I didn&#x27;t knew about typo. One surprising unix program I discovered this year is cal (or ncal). Having a calendar in your terminal is sometimes useful and I wish I knew earlier I could type things like <i>ncal -w 2020</i>
评论 #22584246 未加载
评论 #22583169 未加载
评论 #22589094 未加载
评论 #22586105 未加载
saagarjhaabout 5 years ago
And people say theoretical computer science isn’t useful in “the real world”…<p>I am curious about this one, though, has anyone used it?<p>&gt; The syntax diagnostics from the compiler made by Sue Graham&#x27;s group at Berkeley were the mmost helpful I have ever seen--and they were generated automatically. At a syntax error the compiler would suggest a token that could be inserted that would allow parsing to proceed further. No attempt was made to explain what was wrong.<p>On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate: was it any better than this, or was that the extent of its intelligence and everything else at the time was even worse?
评论 #22582675 未加载
评论 #22583248 未加载
评论 #22583698 未加载
评论 #22583103 未加载
chmaynardabout 5 years ago
The author is THE Doug McIlroy. It&#x27;s wonderful to learn that he&#x27;s still around and spreading the good word.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Douglas_McIlroy" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Douglas_McIlroy</a>
mjw1007about 5 years ago
« Typo was as surprising inside as it was outside. Its similarity measure was based on trigram frequencies, which it counted in a 26x26x26 array. The small memory, which had barely room enough for 1-byte counters, spurred a scheme for squeezing large numbers into small counters. To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count. »<p>This sounds like something from the same family as hyperloglog<p>Wikipedia traces that back to the Flajolet–Martin algorithm in 1984. When would typo have been written?
评论 #22583284 未加载
评论 #22582872 未加载
评论 #22583363 未加载
评论 #22584068 未加载
评论 #22592377 未加载
评论 #22583133 未加载
adbenabout 5 years ago
How about GNU parallel? <a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;parallel&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;parallel&#x2F;</a>
评论 #22583071 未加载
评论 #22583122 未加载
nunoferreiraabout 5 years ago
What about &quot;comm&quot; - compare two sorted files line by line. You can easily get occurrences only in file 1, in both files, only in file 2.<p>Super powerful and saved me hours of work.
评论 #22583101 未加载
评论 #22583714 未加载
beefbroccoliabout 5 years ago
There&#x27;s a very simple system tool that clicked on about 50 simultaneous lightbulbs in my brain after only 10 minutes of playing with it: mkfifo
评论 #22655114 未加载
评论 #22586663 未加载
ur-whaleabout 5 years ago
The fact that dc does (or at least tries to) guarantee error bounds on the <i>result</i> is news to me.<p>And if that does indeed work, that&#x27;s pretty cool.
评论 #22584386 未加载
评论 #22583485 未加载
kmstoutabout 5 years ago
sl<p>```<p><pre><code> ( ) (@@) ( ) (@) () @@ O @ O @ O (@@@) ( ) (@@@@) ( ) ==== ________ ___________ _D _| |_______&#x2F; \__I_I_____===__|_________| |(_)--- | H\________&#x2F; | | =|___ ___| _________________ &#x2F; | | H | | | | ||_| |_|| _| \_____A | | | H |__--------------------| [___] | =| | | ________|___H__&#x2F;__|_____&#x2F;[][]~\_______| | -| | |&#x2F; | |-----------I_____I [][] [] D |=======|____|________________________|_ __&#x2F; =| o |=-O=====O=====O=====O \ ____Y___________|__|__________________________|_ |&#x2F;-=|___|= || || || |_____&#x2F;~\___&#x2F; |_D__D__D_| |_D__D__D_| \_&#x2F; \__&#x2F; \__&#x2F; \__&#x2F; \__&#x2F; \_&#x2F; \_&#x2F; \_&#x2F; \_&#x2F; \_&#x2F;</code></pre> ```
评论 #22584375 未加载
评论 #22589966 未加载
morelispabout 5 years ago
&gt; <i>struct - Brenda Baker undertook her Fortan-to-Ratfor converter against the advice of her department head--me. I thought it would likely produce an ad hoc reordering of the orginal, freed of statement numbers, but otherwise no more readable than a properly indented Fortran program. Brenda proved me wrong. She discovered that every Fortran program has a canonically structured form. Programmers preferred the canonicalized form to what they had originally written.</i><p>We could&#x27;ve had prettier et al instead of style linters 40(+?) years ago. :(
评论 #22582531 未加载
评论 #22583135 未加载
jawilsonabout 5 years ago
I&#x27;ve written a few useful scripts that everyone should have.<p>histogram - simply counts each occurrence of a line and then outputs from highest to lowest. I&#x27;ve implemented this program in several different languages for learning purposes. There are practical tricks that one can apply, such as hashing any line longer than the hash itself.<p>unique - like uniq but doesn&#x27;t need to have sorted input! again, one can simply hash very long lines to save memory.<p>datetimes - looks for numbers that might be dates (seconds or milliseconds in certain reasonable ranges) and adds the human readable version of the date as comments to the end of the line they appear in. This is probably my most used script (I work with protocol buffers that often store dates as int64s).<p>human - reformats numbers into either powers of 2 or powers of 10. inspired obviously by the -h and -H flags from df.<p>I&#x27;m sure I have a few more but if I can&#x27;t remember them from the top of my head, then they clearly aren&#x27;t quite as generally useful.<p>Anyone else have some useful scripts like these?
评论 #22584724 未加载
评论 #22584221 未加载
评论 #22589214 未加载
mkchoi212about 5 years ago
<i>“To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count.”</i><p>Stuff like this really makes me love what the pioneers of CS did in the past. In the past, they were counting every byte and every register while nowadays, programmers make things without considering the impact it will have on the HW.
londons_exploreabout 5 years ago
&gt; The math library for Bob Morris&#x27;s variable-precision desk calculator used backward error analysis to determine the precision necessary at each step to attain the user-specified precision of the result.<p>I wonder if compilers could do this today? If you can bound values for floating point operations, you might be able to replace them with fixed point equivalents and get a big speedup. You might also be able to replace them with ints or smaller floats if you can detect the result is rounded to an int.<p>CPU&#x27;s also have the possibility to do this since they know (some of) the actual values at runtime, and could take shortcuts with floating point calculation in places where not needed for the result.
评论 #22583391 未加载
评论 #22584314 未加载
tannhaeuserabout 5 years ago
What&#x27;s surprising about eqn, dc, and egrep? I&#x27;m using the latter two all the time, and have used eqn (+troff&#x2F;groff and even tbl and pic) in the 1990&#x27;s for manuals and as late as (early) 2000&#x27;s to typeset math-heavy course material. Not nearly as feature-rich as TeX&#x2F;LaTeX, but much more approachable for casual math, with DSLs for typesetting equations, tables, and diagrams&#x2F;graphs. I was delighted to see that GNU had a full suite of roff&#x2F;troff drop-in replacements (which I later learned was implemented by James Clark, of SGML and, recently, Ballerina fame).
评论 #22582668 未加载
评论 #22583105 未加载
评论 #22590279 未加载
评论 #22582563 未加载
ur-whaleabout 5 years ago
First time I hear of typo ... it&#x27;s not on my standard Linux install ... where can I find the source code?
评论 #22582663 未加载
评论 #22583704 未加载
ruslanabout 5 years ago
I would add bc to the list, very useful to make occasional calculations from command line using &quot;human readable&quot; syntax.
评论 #22582641 未加载
lcallabout 5 years ago
I have found it useful to survey the existing unix utilities (maybe every several years). I&#x27;m no genius but I find things I will use. One way of course is simply to review the names in wherever your system stores manual pages, and read (or skim) those where you don&#x27;t know what they do, trying out some things, or trying to remember at least where to look it up later when ready to use it. Another is by browsing to <a href="https:&#x2F;&#x2F;man.openbsd.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;man.openbsd.org&#x2F;</a> , then put a single period (&quot;.&quot;) in the search field, optionally choose a section (and&#x2F;or other system, not sure how far the coverage goes), and click the apropos button.
jhoechtlabout 5 years ago
Doug McIlroy is regularly active in the groff mailing list <a href="https:&#x2F;&#x2F;lists.gnu.org&#x2F;archive&#x2F;html&#x2F;groff&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lists.gnu.org&#x2F;archive&#x2F;html&#x2F;groff&#x2F;</a>
Torwaldabout 5 years ago
What does he man by &quot;record structure in the file system&quot; in re to Multics?
评论 #22587249 未加载
评论 #22587415 未加载
vladdosterabout 5 years ago
Crabs seems likes a really cool program.<p>Here is a paper from Bell Labs<p><a href="http:&#x2F;&#x2F;lucacardelli.name&#x2F;Papers&#x2F;Crabs.pdf" rel="nofollow">http:&#x2F;&#x2F;lucacardelli.name&#x2F;Papers&#x2F;Crabs.pdf</a>
noisy_boyabout 5 years ago
I didn&#x27;t find egrep surprising - I use it quite often. The thing I didn&#x27;t know about it was that it was Al Aho&#x27;s creation. I only knew about him from awk.
yegleabout 5 years ago
killall5 is the most bizarre command that I learned recently.<p>Read manpage before trying it.
评论 #22585724 未加载
评论 #22587192 未加载
评论 #22588599 未加载
评论 #22585606 未加载
smitty1eabout 5 years ago
Hadn&#x27;t heard of most of these.<p>The peoples&#x27; names were more recognizable.
winridabout 5 years ago
I found GNU parallel to be very useful&#x2F;cool.
katharine7about 5 years ago
sed awk tr egrep for processing making special greeting lol converting images<p>all are so exciting!!
TheDesolate0about 5 years ago
sed &amp; awk for life
pvaldesabout 5 years ago
both rename and mmv are pretty handy
hyperpalliumabout 5 years ago
xargs parallelizes with -P<i>n</i>