TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The format of strings in early (pre-C) Unix

102 pointsby fcambusover 9 years ago

9 comments

Thomas_Lordover 9 years ago
Slightly off topic. The article doesn&#x27;t call it out but there&#x27;s a lovely assembly hack here. In:<p>bec 1f &#x2F; branch if no error<p><pre><code> jsr r5,error &#x2F; error in file name &lt;Input not found\n\0&gt;; .even sys exit </code></pre> jsr calls a subroutine passing the return address in register 5. The routine error interprets the return address as a pointer to the string.<p>r5 is incremented in a loop, outputing one character at a time. When the null is found, it&#x27;s time to return.<p>The instructions used to return from &quot;error:&quot; aren&#x27;t shown but there is a subtlety here, I think.<p>&quot;.even&quot; after the string constant assures that the next instruction, &quot;sys exit&quot;, to which &quot;error:&quot; is supposed to return, is aligned on an even address.<p>By implication, the return sequence in &quot;error:&quot; just be sure to increment r5, if r5 is odd. I am guessing something like the pseudo-code:<p>inc r5<p>and r5, fffe<p>ret r5
评论 #10874177 未加载
kazinatorover 9 years ago
After skimming through this, I navigated around this Chris Siebelmann&#x27;s site with the forward and back links, discovering something way more interesting than Unix strings and refreshingly relevant:<p>&quot;How I do per-address blocklists with Exim&quot;<p><a href="https:&#x2F;&#x2F;utcc.utoronto.ca&#x2F;~cks&#x2F;space&#x2F;blog&#x2F;sysadmin&#x2F;EximPerUserBlocklists" rel="nofollow">https:&#x2F;&#x2F;utcc.utoronto.ca&#x2F;~cks&#x2F;space&#x2F;blog&#x2F;sysadmin&#x2F;EximPerUse...</a><p>I run Exim, and I&#x27;m also a huge believer in blocking spam at the SMTP level, and also do some things globally that should perhaps be per-user. I&#x27;m eagerly interested in everything this fellow has to say.
评论 #10862995 未加载
coupdejarnacover 9 years ago
I was hoping to read something juicy like null termination was created by a summer intern.
评论 #10863907 未加载
评论 #10862839 未加载
holmakover 9 years ago
I have seen it claimed that null-terminated strings were encouraged by the instruction sets of the time -- that some instruction sets make null-terminated sequences easier to handle than length-prefixed ones. The article&#x27;s error-message-printing code snippet is a good example. Does anyone think there is any truth to this?
评论 #10863029 未加载
评论 #10862955 未加载
评论 #10862505 未加载
评论 #10863938 未加载
评论 #10862878 未加载
derefrover 9 years ago
I always felt like NUL-termination, newline-separation, and (eventually) UTF-8 were all sort of complementary ideas: they all take as an axiom that strings are fundamentally streams, not random-access buffers; and they all separate the space of single-byte coding units, by simple one-bitwise-instruction-differentiable means, into multiple lexical token types.<p>Taking all three together, you end up with the conception of a &quot;string&quot; as a serialized bitstring encoding a sequence of four <i>lexical</i> types: a NUL type (like the EOF &quot;character&quot; in a STL stream), an ASCII control-code type (or a set of individual control codes as types, if you like), a set of UTF-8 &quot;beginning of rune&quot; types for each possible rune length, a &quot;byte continuing rune&quot; type, and an ASCII-printable type. (You then feed this stream into another lexer to put the rune-segment-tokens together into rune-tokens.)<p>In the end, it&#x27;s not a surprise that all of these components were effectively from a single coherent design, thought up by Ken Thompson. It&#x27;s a bit annoying that each part ended up introduced as part of a separate project, though: NULs with Unix, gets() with C, and runes with Plan 9.<p>One of the pleasant things about Go&#x27;s string support, I think, it that was an opportunity for Ken to express the entirety of his string semantics as a single ADT type. That part of the compiler is quite lovely.
emmelaichover 9 years ago
How else would you implement them, seriously.<p>You have two choices, counted or terminated.<p><i>Counted</i> places a complexity burden at the lowest level of coding.<p>With <i>terminated</i> you still have the option of implementing strings with structs or arrays with counts or anything.<p>And people did of course. Many many different implementations of safe strings exist in C; the fact that none have won out <i>vindicates</i> the decision to use sentinel termination.
bitwizeover 9 years ago
One of the worst programming ideas ever dates bavk even earlier than we thought.<p>If only Dennis had had the foresight to nip that one in the bud...
评论 #10862338 未加载
评论 #10862396 未加载
castellover 9 years ago
The predecessor of Unix, Multics was written in PL&#x2F;1 and was very innovative (modern OS still borrow &quot;new ideas&quot;): <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Multics" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Multics</a>
jamesfmilneover 9 years ago
That was anti-climatic.