The Byte Order Fallacy

45 pointsby codesuki8 months ago

18 comments

IME, there's one big thing that often keeps my programs from being unaffected by byte order: wanting to quickly splat data structures into and out of files, pipes, and sockets, without having to encode or decode each element one-by-one. The only real way to make this endian-independent is to have byte-swapping accessors for everything when it's ultimately produced or consumed, but adding all the code for that is very tedious in most languages. One can argue that handling endianness is the responsible thing to do, but it just doesn't seem worthwhile when I practially know that no one will ever run my code on a big-endian processor.

评论 #41639434 未加载

评论 #41606050 未加载

评论 #41605550 未加载

iscoelho8 months ago

If you are using C/C++ for any new app, there is a possibility you are writing code that has a performance requirement.- mmap/io_uring/drivers and additional "zero-copy" code implementations require consideration about byte order.- filesystems, databases, network applications can be high throughput and will certainly benefit from being zero-copy (with benefits anywhere from +1% to +2000% in performance.)This is absolutely not "premature optimization." If you're a C/C++ engineer, you should know off the top of your head how many cycles syscalls & memcpys cost. (Spoiler: They're slow.) You should evaluate your performance requirements and decide if you need to eliminate that overhead. For certain applications, if you do not meet the performance requirements, you cannot ship.

评论 #41606039 未加载

评论 #41605826 未加载

评论 #41605940 未加载

评论 #41606036 未加载

评论 #41606003 未加载

评论 #41605968 未加载

chasil8 months ago

TCP/IP is big-endian, which is likely the largest footprint for these concerns."htonl, htons, ntohl, ntohs - convert values between host and network byte order"The cheapest big-endian modern device is a Raspberry Pi running a NetBSD "eb" release, for those who want to test their code.<a href="https://wiki.netbsd.org/ports/evbarm/" rel="nofollow">https://wiki.netbsd.org/ports/evbarm/</a>

评论 #41605415 未加载

rwmj8 months ago

Unless you're dealing with binary data in which case byte order matters very much and if you forget to convert it you're causing a world of pain for someone.He even has an example where he just pushes the problem off to someone else "if the people at Adobe wrote proper code to encode and decode their files", yeah hope they weren't ignoring byte order issues.

评论 #41605526 未加载

评论 #41605344 未加载

genpfault8 months ago

(2012)Original thread w/104 comments:<a href="https://news.ycombinator.com/item?id=3796378">https://news.ycombinator.com/item?id=3796378</a>

AstralStorm8 months ago

Really except for the networking (including say Bluetooth) nobody is big endian anymore. So how about just don't leak that thing from the network layer.And do not define any data format to be big endian anymore. Deine it as little endian (do not leave it undefined) and everyone will be happy.

评论 #41605760 未加载

Laremere8 months ago

This is a reasonable way to do things, and I've used it before. However I just used Zig's method here, and like it a lot: <a href="https://ziglang.org/documentation/master/std/#std.io.Reader.readInt" rel="nofollow">https://ziglang.org/documentation/master/std/#std.io.Reader....</a>Given a reader (file, network, buffers can all be turned into readers), you can call readInt. It takes the type you want, and the endianess of the encoding. It's easy to write, self documents, and it's highly efficient.

评论 #41605742 未加载

ultrahax8 months ago

As a games coder I was glad when the xbox 360 / ps3 era came to an end; getting big endian clients talking to little endian servers was an endless source of bugs.

benlivengood8 months ago

The other case where it matters is SIMD instructions where you're serializing or deserializing multiple fields at once, but the SIMD operations are usually architecture specific to begin with and so if you shuffle bytes into and out of the native packed formats it will be specific to the endianness of the native packed format, and then you can forget about byte order outside of those shuffle transformations.

_nalply8 months ago

What he said: if you read bytes with some byte order, you compose them yourself correctly, no byte swapping but just reading byte for byte and convert them to the number value you need. The architecture byte order is implicit as long as you use the architecture's tools to convert the bytes.Rust, for example has from_be_bytes(), from_le_bytes() and from_ne_bytes() methods for the number primitives u16, i16, u32, and so on. They all take a byte array of the correct length and interpret them as big, little and native endian and convert them to the number.The first two methods work fine on all architectures, and that's what this article is about.The third method, however, is architecture-dependent and should not be used for network data, because it would work differently and that's what you don't want. In fact, let me cite this part from the documentation. It's very polite but true.> As the target platform’s native endianness is used, portable code likely wants to use from_be_bytes or from_le_bytes, as appropriate instead.

fracus8 months ago

I don't like these ambiguous titles. From the title I thought I was going to read that byte order doesn't matter when in fact the title should be "a computer's byte order is irrelevant to high-level languages". At least, state the fallacy in unambiguous terms one sentence right away. In any case, was an interesting read.

评论 #41605881 未加载

nativeit8 months ago

> If you wrote it on a PC and tried to read it on a Mac, though, it wouldn't work unless back on the PC you checked a button that said you wanted the file to be readable on a Mac. (Why wouldn't you? Seriously, why wouldn't you?)As a non-SWE, whenever I see checkboxes to enable options that maximize compatibility, I often assume there’s an implicit trade-off, so if it isn’t checked by default, I don’t enable such things unless strictly necessary. I don’t have any solid reason for this, it’s just my intuition. After all, if there were no good reasons not to enable Mac compatibility, why wouldn’t it be the default?Edit: spelling error with “implicit”

e4m28 months ago

Be aware that if you actually want to do as the article prescribes, don't just copy and paste -- you shan't take anything at face value in C: <a href="https://news.ycombinator.com/item?id=31718292">https://news.ycombinator.com/item?id=31718292</a>.

wmf8 months ago

He's right that you shouldn't use ifdefs, but I think a macro like le32toh() is far clearer and more concise than a bunch of shifts and ors.Also, a lot of comments in this thread have nothing to do with the article and appear to be responses to some invisible strawman.

nuancebydefault8 months ago

The byte order matters in all cases where there is i/o, being files, network streams, inter chip communication,... For data that stays on the same processor or for files that are only accessed with the processors of the same endianness, there really is no issue, even when doing bit manipulation.

eternityforest8 months ago

If Network Byte Order wasn't a thing, we could all just pretend big endian doesn't exist outside of mainframes.

wakawaka288 months ago

Characters are not necessarily 8 bits. So you need to do a bit more to have true portability.

wiredfool8 months ago

Unless you’re writing code to decode image file formats.

评论 #41605617 未加载