On Endianness (2021)

59 pointsby bshanksalmost 3 years ago

13 comments

Const-mealmost 3 years ago

> Variable length encoding with a prepended length fieldThe OP's method only works under the following assumptions: (1) length in bytes is power of 2 so the re-interpret trick works, and (2) the complete source data is in memory.In many popular formats (MKV, WebM, etc) the header is variable-length as well e.g. 1-8 bits, the complete integer is therefore [ 1 .. 8 ] bytes, and because these files can be very large they don't fit in memory, need to stream from disk.Under these conditions, the big-endian format (a) eliminates need for any bit shifts, only need to mask away the header in higher bits, slightly faster on most CPUs (b) preserves original bytes in the number except the MSB which contain the header, helps with debugging.I think that's the only use case where big endian has a substantial advantage. Fortunately, all modern CPUs have fast instructions to flip the endianness, e.g. on Intel/AMD they are bswap for 4 or 8 bytes, ror/rol for 2 bytes, available as intrinsic or standard library functions in most programming languages.For all other cases, including binary formats I design, I only use little-endian convention. Little-endian CPUs have won, and I don't like doing unnecessary work flipping these bytes. I'm lazy, also a boilerplate like that is a common source of bugs.

评论 #31486591 未加载

anonymousiamalmost 3 years ago

In the old days of satellite development (40 years ago at Hughes Space & Communications), I observed that the bit order in digital subsystems was always Big Endian, but all the other subsystems were usually Little Endian. Sometimes this caused issues, which fortunately were discovered during the integration & test phase. Usually problems like this were "corrected" by altering the documentation instead of the hardware so for a while, the majority of satellites in Earth orbit had a strange mix of endianness throughout the subsystems. The digital culture eventually won at Hughes/Boeing, but the standards committee punted and allows the endianness to be arbitrary (as long as it's documented) in the CCSDS standards.<a href="https://public.ccsds.org/default.aspx" rel="nofollow">https://public.ccsds.org/default.aspx</a>

cryptonectoralmost 3 years ago

TFA's claims about advantages are nonsense. E.g., for "detecting odd/even" it gives the advantage to little-endian but without considering at all how that might be implemented in hardware, or even without explaining at all why there's an advantage to be had.Or when talking about bignums:> Although this scheme can be realized with either byte order, there is an extra advantage to little endian byte ordering: If the CPU is little endian, you wouldn’t even need to care about the element size in the array because the bytes would naturally arrange themselves smoothly in little endian order across the entire array.But the same is true for big-endian! The array indices will run the opposite way, but so what?This seems aimed at forcing the conclusion that little-endian is better. There's no real advantage to one or the other. The world just has both, and we have to deal with it.

thedanbobalmost 3 years ago

Little-endian makes more sense for computers where calculation is most important. Big-endian makes more sense for humans where, I would argue, comparison is most important. If I want to know e.g. if I can afford something, I'd prefer hearing the price as "four hundred and ..." to instantly get a ballpark rather than "five and ninety and four hundred".

评论 #31485277 未加载

评论 #31485821 未加载

评论 #31486842 未加载

gumbyalmost 3 years ago

The article has some bugs. Alphabets in India (for as far back as we have written records, e.g. Pali) were all LTR until Muslim rule was established at which point some people began to write Hindi using the Persian/Arab script (giving rise to Urdu, a very close sister language to Hindi).Also network order is big endian because most of the machines back then were big endian, most notably the PDP-10 which was the common research machine in that era. Big endian offered simplicity in implementation (remember the earlier models of these machines were mostly hand made, even if made in a factory, with only a few semiconductors; the ALUs and instruction decoding was all done with wires, not traces). Bytes weren't necessarily 8 bits wide and while it was trivial (easier than in C actually) to do pointer arithmetic, a "pointer cast" is a weird way to think of it.So the article is full of the assumption that the world is basically a PDP-11. I think the obsession with such machines has held computing back as much as it has sped it up.

评论 #31493078 未加载

drfuchsalmost 3 years ago

OK, temporarily-victorious little-endians: Explain how you find it perfectly natural that the bits in a byte are big-endian?[Various replies (and votes) indicate I had better edit to clarify:] How come the single byte value written as 0x12 means "eighteen" and not "thirty-three"? Shouldn't the two four-bit nibbles also be considered as little-endian? Or it could even mean "seventy-two", if the bits are little-endian as a whole?

评论 #31485949 未加载

评论 #31486231 未加载

评论 #31485902 未加载

评论 #31485976 未加载

guidoismalmost 3 years ago

A few weeks ago I read the original paper where the names big endian and little endian came from and was really surprised that it was published in 1980!So, before then we didn't have a common way of describing this I guess. It's crazy to me if true.Good paper BTW. Worth reading.

评论 #31485525 未加载

评论 #31503312 未加载

jfimalmost 3 years ago

Some of these advantages are pretty dubious. Detecting whether a number is odd or even or getting the sign bit aren't impacted by endianness, as the CPU isn't working with single bytes at a time.

评论 #31486653 未加载

TazeTSchnitzelalmost 3 years ago

A nice property of little endian is that you can index bits with<pre><code> (arr[x/8] >> (x%8)) & 1</code></pre>

评论 #31486618 未加载

CalChrisalmost 3 years ago

1. Network Byte Order is Big Endian.2. x86, ARMv8 and RISC-V are Little Endian.3. IBM 360 through s390x are Big Endian.4. MIPS, PowerPC and ARM can be bysexual.

评论 #31485981 未加载

ffwszgfalmost 3 years ago

Was Hindu (I assume that’s the original language for which Indo-Arabic numerals were made for) written right to left ? I know Arabic was but I don’t think Indian languages ever have been.

IshKebabalmost 3 years ago

This is a decent attempt. I like that they actually at least try to find pros/cons.But it does seem odd that they give "convention" to big endian despite the fact that basically all modern processors are little endian and there's absolutely no advantage to following "network byte order" if you don't have to.

LargoLasskhyfvalmost 3 years ago

What do you think about the fact that the linked rfc from[-] <a href="https://www.rfc-editor.org/ien/ien137.txt" rel="nofollow">https://www.rfc-editor.org/ien/ien137.txt</a> dates from APRIL the FIRST 42 years ago?Could/would/should it matter?