TechEcho

9 comments

Slightly off topic. The article doesn't call it out but there's a lovely assembly hack here. In:bec 1f / branch if no error<pre><code> jsr r5,error / error in file name <Input not found\n\0>; .even sys exit </code></pre> jsr calls a subroutine passing the return address in register 5. The routine error interprets the return address as a pointer to the string.r5 is incremented in a loop, outputing one character at a time. When the null is found, it's time to return.The instructions used to return from "error:" aren't shown but there is a subtlety here, I think.".even" after the string constant assures that the next instruction, "sys exit", to which "error:" is supposed to return, is aligned on an even address.By implication, the return sequence in "error:" just be sure to increment r5, if r5 is odd. I am guessing something like the pseudo-code:inc r5and r5, ffferet r5

评论 #10874177 未加载

kazinatorover 9 years ago

After skimming through this, I navigated around this Chris Siebelmann's site with the forward and back links, discovering something way more interesting than Unix strings and refreshingly relevant:"How I do per-address blocklists with Exim"<a href="https://utcc.utoronto.ca/~cks/space/blog/sysadmin/EximPerUserBlocklists" rel="nofollow">https://utcc.utoronto.ca/~cks/space/blog/sysadmin/EximPerUse...</a>I run Exim, and I'm also a huge believer in blocking spam at the SMTP level, and also do some things globally that should perhaps be per-user. I'm eagerly interested in everything this fellow has to say.

评论 #10862995 未加载

coupdejarnacover 9 years ago

I was hoping to read something juicy like null termination was created by a summer intern.

评论 #10863907 未加载

评论 #10862839 未加载

holmakover 9 years ago

I have seen it claimed that null-terminated strings were encouraged by the instruction sets of the time -- that some instruction sets make null-terminated sequences easier to handle than length-prefixed ones. The article's error-message-printing code snippet is a good example. Does anyone think there is any truth to this?

评论 #10863029 未加载

评论 #10862955 未加载

评论 #10862505 未加载

评论 #10863938 未加载

评论 #10862878 未加载

derefrover 9 years ago

I always felt like NUL-termination, newline-separation, and (eventually) UTF-8 were all sort of complementary ideas: they all take as an axiom that strings are fundamentally streams, not random-access buffers; and they all separate the space of single-byte coding units, by simple one-bitwise-instruction-differentiable means, into multiple lexical token types.Taking all three together, you end up with the conception of a "string" as a serialized bitstring encoding a sequence of four lexical types: a NUL type (like the EOF "character" in a STL stream), an ASCII control-code type (or a set of individual control codes as types, if you like), a set of UTF-8 "beginning of rune" types for each possible rune length, a "byte continuing rune" type, and an ASCII-printable type. (You then feed this stream into another lexer to put the rune-segment-tokens together into rune-tokens.)In the end, it's not a surprise that all of these components were effectively from a single coherent design, thought up by Ken Thompson. It's a bit annoying that each part ended up introduced as part of a separate project, though: NULs with Unix, gets() with C, and runes with Plan 9.One of the pleasant things about Go's string support, I think, it that was an opportunity for Ken to express the entirety of his string semantics as a single ADT type. That part of the compiler is quite lovely.

emmelaichover 9 years ago

How else would you implement them, seriously.You have two choices, counted or terminated.Counted places a complexity burden at the lowest level of coding.With terminated you still have the option of implementing strings with structs or arrays with counts or anything.And people did of course. Many many different implementations of safe strings exist in C; the fact that none have won out vindicates the decision to use sentinel termination.

bitwizeover 9 years ago

One of the worst programming ideas ever dates bavk even earlier than we thought.If only Dennis had had the foresight to nip that one in the bud...

评论 #10862338 未加载

评论 #10862396 未加载

castellover 9 years ago

The predecessor of Unix, Multics was written in PL/1 and was very innovative (modern OS still borrow "new ideas"): <a href="https://en.wikipedia.org/wiki/Multics" rel="nofollow">https://en.wikipedia.org/wiki/Multics</a>

jamesfmilneover 9 years ago

That was anti-climatic.

9 comments

Thomas_Lordover 9 years ago

评论 #10874177 未加载

kazinatorover 9 years ago

评论 #10862995 未加载

coupdejarnacover 9 years ago

I was hoping to read something juicy like null termination was created by a summer intern.

评论 #10863907 未加载

评论 #10862839 未加载

holmakover 9 years ago

评论 #10863029 未加载

评论 #10862955 未加载

评论 #10862505 未加载

评论 #10863938 未加载

评论 #10862878 未加载

derefrover 9 years ago

emmelaichover 9 years ago

bitwizeover 9 years ago

One of the worst programming ideas ever dates bavk even earlier than we thought.If only Dennis had had the foresight to nip that one in the bud...

The format of strings in early (pre-C) Unix

9 comments

The format of strings in early (pre-C) Unix

9 comments