TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Most Expensive One-byte Mistake (2011)

60 pointsby fleaflickerabout 11 years ago

12 comments

yongjikabout 11 years ago
Yeah, if we only used strings marked with 2-byte integers, everybody would have been happy, because 64kb string is enough for everyone. (And let&#x27;s be realistic, nobody sane would have chosen 4-byte string length back in early 70s.)<p>So, if we went down the pass, what will we have? All the fun of having &quot;legacy&quot; APIs that seem to work but internally only accept strings up to 64kb length and mysteriously chop off excess bytes when you least expect it. It&#x27;s Y2K problem all over again.<p>And just when you finally think you&#x27;re over with it, memory is cheaper again, size_t is 64bit, and someone invariably wants to store a binary blob &gt;4G as string. Fun time again.<p>Have we forgotten how much trouble we went through in the 90s to handle memory in x86 &quot;640k is enough for everybody&quot; architecture?
评论 #7574438 未加载
评论 #7575516 未加载
评论 #7575637 未加载
评论 #7577848 未加载
gcb0about 11 years ago
Oh the irony of history.<p>On the week that str+len was abused left and right, someone surfaces to the frontpage an article about how str+NUL is wrong and everyone should use str+len.
评论 #7574509 未加载
millstoneabout 11 years ago
NUL terminated strings were the right decision for C. They’re certainly much simpler than length fields.<p>Consider using a length field. How big should that field be? If it&#x27;s fixed size, you introduce complications regarding how big a string you can represent, and differences in field sizes across architectures. If it&#x27;s variable-sized (a la UTF-8), then you&#x27;ve added different complications: you would need library functions to read and write the length, to get access to the string contents, to calculate the amount of memory required to hold a string of a given size, etc. Very much not in the spirit of C.<p>Next, what endianness should that field have? NUL terminated strings have no endianness issues: they can be trivially written to files, embedded in network packets, whatever. But with a length field, we either need to remember to marshall the string, or allow for the length field to not be in native byte order. Neither is a pleasant prospect, especially for a 1970’s C programmer.<p>Also, consider C-style string parsing, e.g. strtok&#x2F;strsep. These could not be implemented with length-field strings.<p>Explicit length is better when you have an enforced abstraction, like std::string, but at that point you’re not writing in C. If you have to pick an exposed representation, NUL termination is much better than Pascal-style length fields.<p>So what was the “one-byte mistake?” The article says that it was saving a byte by using NUL termination instead of a two-byte length field. Had K&amp;R not made that “mistake,” we would be unable to make a string longer than 65k - a far more serious limitation than anything NUL termination imposes!<p>K&amp;R got it right.
评论 #7575165 未加载
评论 #7575532 未加载
评论 #7575653 未加载
评论 #7574627 未加载
TomMaszabout 11 years ago
A <i>lot</i> of programming decisions were made to save a byte here and there. It&#x27;s easy to point at them today and say they&#x27;re &quot;bad&quot;, but at the time they were the absolutely <i>correct</i> thing to do. It&#x27;s hard to imagine now but not saving that byte could mean your program wouldn&#x27;t fit into RAM. Try telling your management in the 1960s that your program won&#x27;t load because it&#x27;s &quot;properly coded&quot; and see how far you get.<p>What we&#x27;ve failed to do is ever revisit those decisions and change them where we&#x27;ve identified problems. Yes, you can probably compile (with warnings) files from UNIX v7, but we pay for that compatibility. But there&#x27;s no question designing, building and maintaining a libc alternative is a colossal undertaking and not likely to happen on a whim. So here we are.
radiospielabout 11 years ago
Well, strings without an explicit length field allow for things like strstr(3) or prefix parsing without performance penalties due to reallocating memory.
评论 #7573363 未加载
评论 #7573079 未加载
gumbyabout 11 years ago
When I was at PARC the Mesa guys (who had counted strings) did some analysis and (at least in those days) the counted strings ended up being, in aggregate, faster. I suspect the advantage would be even greater these days since memory allocation was a bigger deal back then.<p>I wonder if you could do this compatibly in the compiler by adding another primitive type (counted string) which had the length in the bytes before the start of the null-terminated string. You&#x27;d need a new type because various routines in the standard library would have to invisibly have two versions for counted and non-counted strings (since if you incremented a string pointer, or used a function like strchr, you&#x27;d have to treat it as a regular char<i>). &quot;Safe&quot; code would use a different call (say, cstrchr) that returned an index instead of a char</i>. The compiler could optionally warn on unsafe &quot;legacy&quot; calls as it can with strcpy instead of strncpy.
cliveowenabout 11 years ago
It&#x27;s all true, but then again, everything would be better if we&#x27;d start from scratch today. Compromises made to tip-toe around technology limitations are what adds complexity to most of today&#x27;s software, but even tomorrow&#x27;s software will be influenced by today&#x27;s limitations. It&#x27;s best not to dwell on the past.
crashandburn4about 11 years ago
This page won&#x27;t load for me and neither will googles webcached version[1], does anyone have a version of this that I can see?<p>[1] <a href="http://webcache.googleusercontent.com/search?q=cache:http://queue.acm.org/detail.cfm?id=2010365&amp;" rel="nofollow">http:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:http:&#x2F;&#x2F;...</a>
评论 #7573302 未加载
评论 #7573274 未加载
orvadoabout 11 years ago
Does anyone understand what the author meant by the following statement:<p>${lang} is the language of the future<p>This looks like a macro for substitution, but maybe its some hip new term I&#x27;ve never encountered. An actual language or just a placeholder for a language that hasn&#x27;t been chosen yet?
评论 #7574303 未加载
bananasabout 11 years ago
Yeah because strings with a length prefix&#x2F;field are just as secure!<p><pre><code> 200,&quot;STR&quot; </code></pre> We know where that got us...<p>Programming 101, rules 1&amp;2:<p>1 - never trust your inputs<p>2 - always check your invariants.
ithinksoabout 11 years ago
With NULL terminated strings it also was simpler to serialize it. If str+len was a standard now we would have 13 more serialization standards.
rw_grimabout 11 years ago
So to be &quot;safe&quot; and &quot;secure&quot; we can only have strings 256 characters long, or we need to waste a few bytes repeatedly for short strings. Sounds like the UTF-8 vs UTF-16&#x2F;32 debate..
评论 #7573680 未加载
评论 #7573634 未加载
评论 #7574422 未加载