TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A case against text protocols

24 pointsby unmdplyralmost 4 years ago

6 comments

horsawlarwayalmost 4 years ago
I don&#x27;t really agree... with any of this, actually.<p>It <i>IS</i> simpler to use text - He claims &quot;Text was never as portable as it&#x27;s believed to be.&quot;, but ascii&#x2F;unicode are probably the most portable formats we&#x27;ve ever created. I can&#x27;t think of a computer that won&#x27;t be able to parse and display one of those two formats (From embedded hardware, to old f16 parts, to my modern laptop, to the raspberry pi, to the fucking computer I designed in my EE classes).<p>Being able to type out messages is hugely helpful while debugging and developing (I copy and paste things that look exactly like the code he claims no one would ever write - It&#x27;s like he doesn&#x27;t understand the value of a clipboard, or a text editor I can dump a message into and change a single value in - Something I can conveniently do on pretty much any system <i>ANYWHERE</i> without having to install any extra software if the format is text)<p>His parsing example is hilarious - See that readable text above? Psh, Folly! That&#x27;s hard to read so lets use specialized tools that depend entirely on system specific details and configuration (Int size, byte order, struct packing, etc) and claim that&#x27;s better!<p>Extensibility, meh - I find this one rarely matters as much as people believe it does, but to me, the big benefit of text is that I can easily craft messages with new fields myself without having to write code to do it.<p>Error recovery... I can sort of agree (in transit over a noisy channel, use a format that supports ECCs) but he misses that there are two different types of error here - An unexpected field value&#x2F;type, and a generally malformed payload.<p>The first will break binary but not something like a json parser. The second will break both (he only talks about the second, since he assumed the failure happens at tokenization time...)<p>Basically - My whole point devolves into &quot;It sure seems like he&#x27;s arguing for premature optimization&quot;.<p>If you have a spot where text is particularly expensive or inefficient, suck it up and move to a binary protocol that requires more documentation, tooling, and work. Everywhere else... it seems like a bad move.
评论 #27543060 未加载
评论 #27544855 未加载
pwdisswordfish8almost 4 years ago
Another argument: text-based protocols often admit too many degrees of freedom in constructing messages, the handling of which is left underspecified and completely overlooked during implementation. (What happens if you separate lines with LF instead of CRLF in HTTP headers? What if the opening and closing HTML tags don&#x27;t match? I know this should not usually happen, but how should I handle it when it does anyway?)<p>It&#x27;s not by any means <i>exclusive</i> to text-based protocols, but there&#x27;s this tendency to assume everything about a text-based protocols is ‘obvious’, ‘self-documenting’ and doesn&#x27;t need specifying, and to think that just because the individual elements of the protocol are human-readable, this will somehow magically make the computers using the protocol follow the Gricean maxims (if it doesn&#x27;t make sense, nobody will ever say that, therefore I don&#x27;t need to think about it).
评论 #27537948 未加载
IshKebabalmost 4 years ago
This is all 100% correct but he missed probably the biggest reason!<p>It&#x27;s really really hard to write an unambiguous text protocol specification and equally hard to write something that implements it properly. Think about all the extra ambiguities text adds: where is whitespace allowed? How are newlines encoded? Are windows line endings ok? Does case matter? Which bits are ASCII and which are UTF-8? How are values quoted?<p>It&#x27;s just insanely more complicated, and there are many subtle differences that seem reasonable to different people. &quot;Of course whitespace is allowed there!&quot;<p>It makes everything way less robust and way more prone to quirks-mode style degradation.
评论 #27577744 未加载
Joker_vDalmost 4 years ago
&gt; It is as simple as dumping struct ntp_packet on wire and reading it off it -- no parsing involved except for calling ntohX()&#x2F;htonX() on all fields except li, vn and mode.<p>Nope, you may still need to call ntoh&#x2F;hton, depending on how the compiler you use orders the bitfields inside an int. Plus you need &quot;__attribute__((packed))&quot; or whatever the compiler you use supports to make that C struct definition mean what it looks like it means: even then I am not sure those three bitfields are required to occupy exactly 8 bits.
评论 #27540330 未加载
gary_0almost 4 years ago
Back when HTTP and SMTP were designed, the Internet was mostly old Unix machines talking to each other. Everything was a file full of `char`, piped to an 80-column terminal. Text-based made sense. Decades later, when the computer world was bigger, faster, and more unified, other systems kind of cargo-culted off of those earlier successes. And isn&#x27;t it neat that you can telnet into port 80?<p>I think another big reason text-based protocols are seductive is that they&#x27;re an engineering path of least resistance. When you start off text-based, how to debug and analyze and interoperate with other implementations can be put off for later, or be Someone Else&#x27;s Problem. Whereas if you design the same protocol but in binary, these tricky considerations are harder to ignore -- even though text-based protocols will still run into the same problems eventually, because there&#x27;s no way I&#x27;m typing in a Cookie header by hand or decoding Base64 in my head.
评论 #27538575 未加载
评论 #27639772 未加载
评论 #27639694 未加载
评论 #27539964 未加载
评论 #27639670 未加载
zajio1amalmost 4 years ago
There is one thing that is often neglected in text vs binary protocols debate and that is self-terminating vs prior-length. Although it is not strictly connected, text protocols are usually self-terminating (e.g. closing tags), while binary protocols are usually prior-length (e.g. type-length-value approach). The first approach leads to escaping and all associated problems.